[Bug] percentile: DataFusion quantizes interpolation weight to 6 decimal places

### Describe the bug

Comet's native `percentile` aggregate (PR #4542) maps to DataFusion's `percentile_cont`, which computes the linear interpolation weight with a quantization step:

```rust
const INTERPOLATION_PRECISION: f64 = 1_000_000.0;
let fraction = index - (lower_index as f64);
let scaled = (fraction * INTERPOLATION_PRECISION) as usize;
let weight = scaled as f64 / INTERPOLATION_PRECISION;
let interpolated_f = lower_f + (upper_f - lower_f) * weight;
```

The interpolation weight is truncated to 6 decimal places. Spark's exact `Percentile` interpolates with the full-precision fraction (`(position - lower) * higherValue + (higher - position) * lowerValue`), so a deeply-interpolated value can differ from Spark by up to roughly `(upper - lower) * 1e-6`.

### Affected versions

Spark 3.4 / 3.5 / 4.0 / 4.1, wherever `percentile(col, p)` (or `median`, or `percentile_cont ... WITHIN GROUP`) maps to the native path.

### Impact

Minor. The difference only appears when `p * (n - 1)` has a fractional part not representable in 6 decimal places, and is bounded by `(upper - lower) * 1e-6`. The cases tested in `percentile.sql` match Spark exactly.

### Possible fix

Either contribute a higher-precision (or unquantized) interpolation upstream to DataFusion's `percentile_cont`, or implement a Comet-specific accumulator that matches Spark's interpolation exactly.

Surfaced by the `percentile` audit accompanying #4542.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] percentile: DataFusion quantizes interpolation weight to 6 decimal places #4719

Describe the bug

Affected versions

Impact

Possible fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Bug] percentile: DataFusion quantizes interpolation weight to 6 decimal places #4719

Description

Describe the bug

Affected versions

Impact

Possible fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions