Is your feature request related to a problem or challenge?
Many Spark expressions have different behavior across Spark versions. This is especially the case when comparing Spark 3.x and 4.x where there are many breaking changes.
I think it is important to start addressing this in the Spark-compatible expressions in DataFusion.
fwiw, the approach we take in Comet is that each expression has a getSupportLevel method that can return Compatible, Incompatible(reason), or Unsupported. These methods are context-aware based on Spark version, Spark configuration (e.g. ANSI mode enabled/disabled), and the specific arguments being passed to the expression. Here is an example:
override def getSupportLevel(expr: Reverse): SupportLevel = {
if (containsBinary(expr.child.dataType)) {
Incompatible(Some("reverse on array containing binary is not supported"))
} else {
Compatible(None)
}
}
We also have a shim layer where we can implement different code per Spark version. This was necessary for Comet because there are API changes in Spark between versions and we need to compile per-version. It is simpler for DataFusion because we can just pass the Spark version (or some other flags) into the expression constructor.
Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response
Is your feature request related to a problem or challenge?
Many Spark expressions have different behavior across Spark versions. This is especially the case when comparing Spark
3.xand4.xwhere there are many breaking changes.I think it is important to start addressing this in the Spark-compatible expressions in DataFusion.
fwiw, the approach we take in Comet is that each expression has a
getSupportLevelmethod that can returnCompatible,Incompatible(reason), orUnsupported. These methods are context-aware based on Spark version, Spark configuration (e.g. ANSI mode enabled/disabled), and the specific arguments being passed to the expression. Here is an example:We also have a shim layer where we can implement different code per Spark version. This was necessary for Comet because there are API changes in Spark between versions and we need to compile per-version. It is simpler for DataFusion because we can just pass the Spark version (or some other flags) into the expression constructor.
Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
No response