Skip to content

Unify cast handling by removing CastColumnExpr branches in pruning and ordering equivalence#21545

Open
kosiew wants to merge 4 commits intoapache:mainfrom
kosiew:cast-04-20164
Open

Unify cast handling by removing CastColumnExpr branches in pruning and ordering equivalence#21545
kosiew wants to merge 4 commits intoapache:mainfrom
kosiew:cast-04-20164

Conversation

@kosiew
Copy link
Copy Markdown
Contributor

@kosiew kosiew commented Apr 11, 2026

Which issue does this PR close?


Rationale for this change

Downstream components in DataFusion currently branch on both CastExpr and CastColumnExpr, even though they represent equivalent semantics after recent cast unification work. This duplication increases maintenance burden, introduces unnecessary complexity, and raises the risk of inconsistent behavior when evolving cast-related logic.

This PR simplifies downstream logic by collapsing these dual branches into a single CastExpr-based handling path, aligning with the broader cast unification effort.


What changes are included in this PR?

  • Removed CastColumnExpr handling in downstream logic:

    • Eliminated branching specific to CastColumnExpr in:

      • equivalence/properties/mod.rs
      • pruning/pruning_predicate.rs
  • Unified cast substitution logic in ordering equivalence:

    • Replaced substitute_cast_like_ordering with substitute_cast_ordering

    • Simplified implementation to only consider CastExpr

    • Preserved correctness by ensuring:

      • Exact child expression match
      • Order-preserving (widening) casts only
  • Refactored pruning rewrite logic:

    • Introduced helper: rewrite_cast_child_to_prunable
    • Removed duplicated validation and recursion logic across cast types
    • Updated both CastExpr and TryCastExpr handling to reuse the helper
  • Test updates and cleanup:

    • Updated tests to use CastExpr::new_with_target_field
    • Simplified test cases by removing vector-based equality conditions
    • Removed obsolete test covering CastColumnExpr
    • Renamed pruning test to reflect unified cast handling
  • Minor code cleanups:

    • Removed unused imports
    • Simplified iterator chains and substitution logic

Are these changes tested?

Yes.

  • Existing pruning and equivalence tests were updated to reflect the unified cast handling.

  • Obsolete tests relying on CastColumnExpr were removed.

  • Test cases were adjusted to ensure behavior remains unchanged, including:

    • Ordering equivalence with cast substitutions
    • Predicate pruning with cast expressions

These updates ensure no regression in behavior while validating the simplified implementation.


Are there any user-facing changes?

No.

This change is an internal refactor and does not modify user-facing APIs or query behavior. It preserves existing semantics while improving maintainability.


LLM-generated code disclosure

This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.

kosiew added 4 commits April 11, 2026 10:25
Eliminate CastColumnExpr compatibility in pruning logic.
Consolidate equivalence cast substitution to handle only
unified CastExpr. Remove redundant CastColumn-specific
equivalence regression from dependency imports.
Rebuild unified CastExpr to preserve target field metadata
using new_with_target_field(...). Update regression test
in dependency.rs to utilize this field-aware CastExpr
for better consistency and reliability in test cases.
Renamed the private helper `substitute_cast_like_ordering` to
`substitute_cast_ordering`. Replaced the mutable substitution push
loop with an iterator-based collection for better performance.
Inlined single-use `arrow_schema` and `from_type` locals in
cast pruning branches. Simplified the monotonic sort regression test
by removing over-general `sort_columns`, collapsing one-element
equality vectors, and using distinct case names for clarity.
Remove the unnecessary intermediate Vec in mod.rs.
Implement rewrite_cast_child_to_prunable in pruning_predicate.rs
and reuse it for both CastExpr and TryCastExpr to streamline
casting operation handling.
@github-actions github-actions bot added the physical-expr Changes to the physical-expr crates label Apr 11, 2026
@kosiew kosiew marked this pull request as ready for review April 11, 2026 02:45
@kosiew kosiew requested a review from adriangb April 11, 2026 02:46
Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kosiew -- this looks like an improvement to me

Do you think we are close to being able to deprecate CastColumnExpr ?

@kosiew
Copy link
Copy Markdown
Contributor Author

kosiew commented Apr 11, 2026

@alamb
Yes, after this lands, the next PR will be to remove CastColumnExpr from the public API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants