You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
AVG(<decimal>) over a window falls back to Spark on Spark 4.x, even though CometWindowExec reports AVG as a natively supported window aggregate and process_agg_func has an AvgDecimal window branch.
On Spark 4.x, Average of a decimal is represented in the physical plan as a Cast(avg(UnscaledValue(child)) / pow10 AS decimal(p, s)) wrapping the WindowExpression, for example:
Window [cast((avg(UnscaledValue(v#39)) windowspecdefinition(...) / 100.0) as decimal(14,6)) AS run_avg#58]
CometWindowExec.convert only unwraps two shapes:
caseAlias(w: WindowExpression, _) => w
caseAlias(MakeDecimal(w: WindowExpression, _, _, _), _) => w
case other => withFallbackReason(...); returnNone
The Cast(Divide(...)) form matches neither, so the whole WindowExec falls back to Spark. Results stay correct (Spark computes them), but:
The AvgDecimal native window branch in process_agg_func (native/core/src/execution/planner.rs) is effectively dead on Spark 4.x.
statement
CREATETABLEdec_avg(g int, v decimal(10,2)) USING parquet
statement
INSERT INTO dec_avg VALUES (1, 10.10), (1, 20.25), (1, 30.33), (1, 41.00)
-- runs natively on integer/double, falls back on decimal under Spark 4.xSELECT v, AVG(v) OVER (PARTITION BY g ORDER BY v
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS run_avg
FROM dec_avg
AVG(<decimal>) over an ever-expanding window frame runs natively via the AvgDecimal UDAF and matches Spark, the same way SUM(<decimal>) already does. This needs CometWindowExec.convert to recognize the Spark 4.x Cast(Divide(WindowExpression, ...)) average shape (in addition to the existing MakeDecimal shape).
The MakeDecimal unwrap suggests decimal AVG is reachable on some Spark versions (likely 3.x); the divergent shape is the 4.x Cast(Divide(...)) form. Behavior should be confirmed across 3.4 / 3.5 / 4.0 / 4.1 when fixing.
Describe the bug
AVG(<decimal>)over a window falls back to Spark on Spark 4.x, even thoughCometWindowExecreportsAVGas a natively supported window aggregate andprocess_agg_funchas anAvgDecimalwindow branch.On Spark 4.x,
Averageof a decimal is represented in the physical plan as aCast(avg(UnscaledValue(child)) / pow10 AS decimal(p, s))wrapping theWindowExpression, for example:CometWindowExec.convertonly unwraps two shapes:The
Cast(Divide(...))form matches neither, so the wholeWindowExecfalls back to Spark. Results stay correct (Spark computes them), but:AvgDecimalnative window branch inprocess_agg_func(native/core/src/execution/planner.rs) is effectively dead on Spark 4.x.AVGdecimal window aggregates never run natively on Spark 4.x, contrary to the coverage implied by feat: extend native windows support #4209.Steps to reproduce
Comet falls back:
Unsupported window expression: cast((avg(UnscaledValue(v)) ... / 100.0) as decimal(14,6)).Expected behavior
AVG(<decimal>)over an ever-expanding window frame runs natively via theAvgDecimalUDAF and matches Spark, the same waySUM(<decimal>)already does. This needsCometWindowExec.convertto recognize the Spark 4.xCast(Divide(WindowExpression, ...))average shape (in addition to the existingMakeDecimalshape).Additional context
MakeDecimalunwrap suggests decimalAVGis reachable on some Spark versions (likely 3.x); the divergent shape is the 4.xCast(Divide(...))form. Behavior should be confirmed across 3.4 / 3.5 / 4.0 / 4.1 when fixing.AVGis reachable, the sliding-frame overflow guard added for [Bug] SUM(decimal) over a sliding window frame returns wrapped out-of-range value on overflow instead of NULL #4729 (which also coversAVG) becomes the relevant safeguard for sliding frames.