Skip to content

[Bug] AVG(decimal) over a window always falls back to Spark on Spark 4.x (AvgDecimal window branch is dead) #4731

Description

@andygrove

Describe the bug

AVG(<decimal>) over a window falls back to Spark on Spark 4.x, even though CometWindowExec reports AVG as a natively supported window aggregate and process_agg_func has an AvgDecimal window branch.

On Spark 4.x, Average of a decimal is represented in the physical plan as a Cast(avg(UnscaledValue(child)) / pow10 AS decimal(p, s)) wrapping the WindowExpression, for example:

Window [cast((avg(UnscaledValue(v#39)) windowspecdefinition(...) / 100.0) as decimal(14,6)) AS run_avg#58]

CometWindowExec.convert only unwraps two shapes:

case Alias(w: WindowExpression, _) => w
case Alias(MakeDecimal(w: WindowExpression, _, _, _), _) => w
case other => withFallbackReason(...); return None

The Cast(Divide(...)) form matches neither, so the whole WindowExec falls back to Spark. Results stay correct (Spark computes them), but:

  • The AvgDecimal native window branch in process_agg_func (native/core/src/execution/planner.rs) is effectively dead on Spark 4.x.
  • AVG decimal window aggregates never run natively on Spark 4.x, contrary to the coverage implied by feat: extend native windows support #4209.

Steps to reproduce

statement
CREATE TABLE dec_avg(g int, v decimal(10,2)) USING parquet

statement
INSERT INTO dec_avg VALUES (1, 10.10), (1, 20.25), (1, 30.33), (1, 41.00)

-- runs natively on integer/double, falls back on decimal under Spark 4.x
SELECT v, AVG(v) OVER (PARTITION BY g ORDER BY v
  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS run_avg
FROM dec_avg

Comet falls back: Unsupported window expression: cast((avg(UnscaledValue(v)) ... / 100.0) as decimal(14,6)).

Expected behavior

AVG(<decimal>) over an ever-expanding window frame runs natively via the AvgDecimal UDAF and matches Spark, the same way SUM(<decimal>) already does. This needs CometWindowExec.convert to recognize the Spark 4.x Cast(Divide(WindowExpression, ...)) average shape (in addition to the existing MakeDecimal shape).

Additional context

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions