Skip to content

[SPARK-56911][SQL] Simplify Cast to decimal codegen under ANSI mode#55936

Open
gengliangwang wants to merge 2 commits into
apache:masterfrom
gengliangwang:SPARK-56911-cast-decimal
Open

[SPARK-56911][SQL] Simplify Cast to decimal codegen under ANSI mode#55936
gengliangwang wants to merge 2 commits into
apache:masterfrom
gengliangwang:SPARK-56911-cast-decimal

Conversation

@gengliangwang
Copy link
Copy Markdown
Member

@gengliangwang gengliangwang commented May 17, 2026

What changes were proposed in this pull request?

Extend CastUtils.java with two helpers for decimal precision adjustment and use them from Cast.changePrecision (both eval and codegen). The new helpers mutate the input Decimal in place (matching the in-place semantics of the existing inline codegen), so they're safe to call on the temporary produced by Decimal.fromString(...) / Decimal.apply(...) / decimal-arithmetic results.

Helpers added:

  • changePrecisionExact(Decimal, int, int, QueryContext): ANSI throw on overflow, preserves the per-call-site QueryContext so error messages keep their query-origin info.
  • changePrecisionOrNull(Decimal, int, int): non-ANSI, returns null on overflow (no QueryContext needed).

Cast.scala changes:

  • changePrecision eval method dispatches on nullOnOverflow and delegates to the appropriate helper.
  • changePrecision codegen method has three branches now: the existing canNullSafeCast fast path (unchanged), a nullOnOverflow branch (inline), and the ANSI throw branch which now emits a one-line CastUtils.changePrecisionExact(...) call instead of the 5-line if/else overflow block.

Why are the changes needed?

Part of SPARK-56908 (umbrella). The ANSI throw branch of Cast.changePrecision is hit by every cast to decimal that may overflow (very common in TPC-DS, where cast(int as decimal(7,2)) is widespread). Collapsing the 5-line inline body to one line shrinks the generated Java source for those plans.

Does this PR introduce any user-facing change?

No. The compiled behavior is identical; only the emitted Java source text changes.

How was this patch tested?

build/sbt "catalyst/testOnly *CastSuite *CastWithAnsiOnSuite *CastWithAnsiOffSuite *AnsiCastSuite *TryCastSuite *DecimalSuite"

332/332 pass.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor 1.x

@gengliangwang
Copy link
Copy Markdown
Member Author

gengliangwang commented May 17, 2026


Stack overview (SPARK-56908 umbrella)

This PR is part of the SPARK-56908 codegen-simplification series. Current status:

Merged:

Open:

@gengliangwang
Copy link
Copy Markdown
Member Author

Audited this PR for the same lessons surfaced by @viirya and @cloud-fan on #55938 (and applied to #55934 / #55939):

  1. Are the helpers redundant with an existing Scala API? CastUtils.changePrecisionExact calls d.changePrecision(precision, scale) in-place; Decimal.toPrecision(...) does the same logical work but clones the input first (val copy = clone()). The in-place variant avoids a per-row allocation, so the helpers serve a distinct purpose.
  2. Are the eval-path additions redundant? The pre-PR changePrecision method was an 8-line if-changePrecision { ... } else { if nullOnOverflow null else throw } block; the helpers shrink it to 5 lines. Not a one-liner that was already concise.

So no changes needed on this PR for that review.

Extend `CastUtils.java` with helpers for `byte` and `short` ANSI cast
targets and use them from `Cast.scala`. Drops the byte/short-target
dispatch (and the now-unused `lowerAndUpperBound` Scala helper) added
in SPARK-56909 -- after this PR, all integral and fractional narrowing
ANSI casts share the same `CastUtils.<...>Exact` one-line codegen.

Helpers added:
* `shortToByteExact(short)`, `intToByteExact(int)`, `longToByteExact(long)`
* `intToShortExact(int)`, `longToShortExact(long)`
* `floatToByteExact(float)`, `doubleToByteExact(double)`
* `floatToShortExact(float)`, `doubleToShortExact(double)`

`Cast.scala` changes:
* `castIntegralTypeToIntegralTypeExactCode` / `castFractionToIntegralTypeCode`
  no longer dispatch on target type -- the helper-name pattern
  `${integralPrefix(from)}To${target.capitalize}Exact` covers all four
  target types.
* Eval paths for `castToByte` and `castToShort` add ANSI cases for
  `ShortType` / `IntegerType` / `LongType` / `FloatType` / `DoubleType`
  source types that delegate to the new helpers; the existing
  `exactNumeric.toInt(b) + bounds-check` fallback now only handles the
  remaining `Decimal` source.

Part of SPARK-56908 (umbrella). The original byte/short ANSI cast bodies
were 5 lines each across 8 call sites; this PR collapses them to one
line per call site, matching the int/long target work from SPARK-56909.

No. The compiled behavior is identical; only the emitted Java source
text changes.

```
build/sbt "catalyst/testOnly *CastSuite *CastWithAnsiOnSuite \
  *CastWithAnsiOffSuite *AnsiCastSuite *TryCastSuite \
  *ExpressionClassIdentitySuite"
```

312/312 pass.

Generated-by: Cursor 1.x
Extend `CastUtils.java` with two helpers for decimal precision adjustment
and use them from `Cast.changePrecision` (both the eval and codegen
implementations). The new helpers mutate the input `Decimal` in place
(matching the behavior of the existing inline codegen), so they're safe
to call on the temporary produced by `Decimal.fromString(...)` /
`Decimal.apply(...)` / decimal-arithmetic results.

Helpers added:
* `changePrecisionExact(Decimal, int, int, QueryContext)`: ANSI throw on
  overflow, preserves the per-call-site `QueryContext` so error messages
  keep their query-origin info.
* `changePrecisionOrNull(Decimal, int, int)`: non-ANSI, returns `null`
  on overflow (no `QueryContext` needed).

`Cast.scala` changes:
* `changePrecision` eval method dispatches on `nullOnOverflow` and
  delegates to the appropriate helper.
* `changePrecision` codegen method has three branches now: the existing
  `canNullSafeCast` fast path (unchanged), a `nullOnOverflow` branch
  (inline), and the ANSI throw branch which now emits a one-line
  `CastUtils.changePrecisionExact(...)` call instead of the 5-line
  `if/else` overflow block.

Part of SPARK-56908 (umbrella). The ANSI throw branch of
`Cast.changePrecision` is hit by every cast to decimal that may overflow
(very common in TPC-DS, where `cast(int as decimal(7,2))` is widespread).
Collapsing the 5-line inline body to one line shrinks the generated
Java source for those plans.

No. The compiled behavior is identical; only the emitted Java source
text changes.

```
build/sbt "catalyst/testOnly *CastSuite *CastWithAnsiOnSuite \
  *CastWithAnsiOffSuite *AnsiCastSuite *TryCastSuite *DecimalSuite \
  *ExpressionClassIdentitySuite"
```

337/337 pass.

Generated-by: Cursor 1.x
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant