Skip to content

[SPARK-56910][SQL] Simplify Cast to byte/short codegen under ANSI mode#55935

Open
gengliangwang wants to merge 1 commit into
apache:masterfrom
gengliangwang:SPARK-56910-cast-byte-short
Open

[SPARK-56910][SQL] Simplify Cast to byte/short codegen under ANSI mode#55935
gengliangwang wants to merge 1 commit into
apache:masterfrom
gengliangwang:SPARK-56910-cast-byte-short

Conversation

@gengliangwang
Copy link
Copy Markdown
Member

@gengliangwang gengliangwang commented May 17, 2026

What changes were proposed in this pull request?

Introduce CastUtils.java with nine static helpers for ANSI overflow-checked narrowing to byte / short, and use them from Cast.scala (both codegen and eval paths).

Helpers added:

  • shortToByteExact(short), intToByteExact(int), longToByteExact(long)
  • intToShortExact(int), longToShortExact(long)
  • floatToByteExact(float), doubleToByteExact(double)
  • floatToShortExact(float), doubleToShortExact(double)

ByteExactNumeric / ShortExactNumeric only expose same-type identity narrowing (their toByte(byte) / toShort(short) are trivial), so unlike the int / long targets refactored in #55934 — which delegate to LongExactNumeric.toInt / FloatExactNumeric.toInt / DoubleExactNumeric.toInt / toLong — there is no existing Scala object to route the byte/short narrowing through. The Java helper is the cleanest fit.

Cast.scala changes:

  • castIntegralTypeToIntegralTypeExactCode: the byte / short branch (previously an inline 5-line if/throw block) emits a single CastUtils.${integralPrefix(from)}To${target.capitalize}Exact($c) call. The int branch (added in [SPARK-56909][SQL] Simplify Cast to int/long codegen under ANSI mode #55934) is unchanged.
  • castFractionToIntegralTypeCode: the byte / short branch (previously an inline 5-line floor/ceil block plus lowerAndUpperBound) emits a single CastUtils.${fractionalPrefix(from)}To${target.capitalize}Exact($c) call. The int / long branch (added in [SPARK-56909][SQL] Simplify Cast to int/long codegen under ANSI mode #55934) is unchanged. The now-unused lowerAndUpperBound Scala helper is removed.
  • Eval paths for castToByte and castToShort add ANSI cases for ShortType / IntegerType / LongType / FloatType / DoubleType source types that delegate to the new helpers, replacing the existing multi-line exactNumeric.toInt(b) + bounds-check body.
  • Two small integralPrefix(from: DataType) / fractionalPrefix(from: DataType) Scala helpers handle the method-name dispatch.

Why are the changes needed?

Part of SPARK-56908 (umbrella). The byte/short narrowing ANSI bodies were 5 lines each across 8 codegen call sites; this PR collapses them to one line per call site, matching the int/long target work merged in #55934.

Does this PR introduce any user-facing change?

No. The compiled behavior is identical; only the emitted Java source text changes.

How was this patch tested?

build/sbt "catalyst/testOnly *CastSuite *CastWithAnsiOnSuite *CastWithAnsiOffSuite *AnsiCastSuite *TryCastSuite"

307/307 pass.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor 1.x

@gengliangwang
Copy link
Copy Markdown
Member Author

gengliangwang commented May 17, 2026


Stack overview (SPARK-56908 umbrella)

This PR is part of the SPARK-56908 codegen-simplification series. Current status:

Merged:

Open:

@gengliangwang
Copy link
Copy Markdown
Member Author

Audited this PR for the same lessons surfaced by @viirya and @cloud-fan on #55938 (and applied to #55934 / #55939):

  1. Are the helpers redundant with an existing Scala object? No — ByteExactNumeric / ShortExactNumeric only have trivial toByte(byte) / toShort(short) (same-type). Cross-type narrowing with bounds check (e.g. int -> byte throwing castingCauseOverflowError) doesn't exist on any *ExactNumeric object, so the new intToByteExact, longToByteExact, ... helpers are genuinely net-new.
  2. Are the eval-path additions redundant? No — master castToByte / castToShort ANSI bodies are multi-line (call exactNumeric.toInt(b) with a try/catch, then bounds-check the int down to byte/short). The new helpers consolidate those bodies into a one-line call.

So no changes needed on this PR for that review.

Extend `CastUtils.java` with helpers for `byte` and `short` ANSI cast
targets and use them from `Cast.scala`. Drops the byte/short-target
dispatch (and the now-unused `lowerAndUpperBound` Scala helper) added
in SPARK-56909 -- after this PR, all integral and fractional narrowing
ANSI casts share the same `CastUtils.<...>Exact` one-line codegen.

Helpers added:
* `shortToByteExact(short)`, `intToByteExact(int)`, `longToByteExact(long)`
* `intToShortExact(int)`, `longToShortExact(long)`
* `floatToByteExact(float)`, `doubleToByteExact(double)`
* `floatToShortExact(float)`, `doubleToShortExact(double)`

`Cast.scala` changes:
* `castIntegralTypeToIntegralTypeExactCode` / `castFractionToIntegralTypeCode`
  no longer dispatch on target type -- the helper-name pattern
  `${integralPrefix(from)}To${target.capitalize}Exact` covers all four
  target types.
* Eval paths for `castToByte` and `castToShort` add ANSI cases for
  `ShortType` / `IntegerType` / `LongType` / `FloatType` / `DoubleType`
  source types that delegate to the new helpers; the existing
  `exactNumeric.toInt(b) + bounds-check` fallback now only handles the
  remaining `Decimal` source.

Part of SPARK-56908 (umbrella). The original byte/short ANSI cast bodies
were 5 lines each across 8 call sites; this PR collapses them to one
line per call site, matching the int/long target work from SPARK-56909.

No. The compiled behavior is identical; only the emitted Java source
text changes.

```
build/sbt "catalyst/testOnly *CastSuite *CastWithAnsiOnSuite \
  *CastWithAnsiOffSuite *AnsiCastSuite *TryCastSuite \
  *ExpressionClassIdentitySuite"
```

312/312 pass.

Generated-by: Cursor 1.x
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant