refactor(pruning): remove column param from PruningStatistics::row_counts#21369
Merged
adriangb merged 2 commits intoapache:mainfrom Apr 6, 2026
Merged
refactor(pruning): remove column param from PruningStatistics::row_counts#21369adriangb merged 2 commits intoapache:mainfrom
adriangb merged 2 commits intoapache:mainfrom
Conversation
Row counts are container-level (same for all columns), but the trait method forced a `column: &Column` parameter. 8 of 11 implementations ignored it with `_column`. The Parquet impl used it unnecessarily to construct a `StatisticsConverter` just to call `row_group_row_counts()`, which doesn't need the column. This is a breaking change to the `PruningStatistics` trait: `fn row_counts(&self, column: &Column)` → `fn row_counts(&self)`. Also fixes RowGroupPruningStatistics to read row counts directly from row group metadata instead of routing through StatisticsConverter, and removes PrunableStatistics' column-exists validation for row counts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contributor
Author
alamb
approved these changes
Apr 6, 2026
Contributor
alamb
left a comment
There was a problem hiding this comment.
I think it is a good change but it is a breaking API change as this is part of the public API.
Maybe worth a small note in the upgrade guide
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
adriangb
commented
Apr 6, 2026
| > always return `None`. Use the `downcast_ref` method above instead, or | ||
| > dereference through the `Arc` first with `plan.as_ref() as &dyn Any`. | ||
|
|
||
| ### `PruningStatistics::row_counts` no longer takes a `column` parameter |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
N/A — standalone API improvement, prerequisite for #21157.
Rationale for this change
PruningStatistics::row_counts(&self, column: &Column)takes a column parameter, but row counts are container-level (same for all columns). 8 of 11 implementations ignore the parameter with_column. The Parquet impl (RowGroupPruningStatistics) unnecessarily constructs aStatisticsConverterfrom the column just to callrow_group_row_counts(), which doesn't use the column at all.The existing code even has a comment acknowledging this:
And a test comment:
What changes are included in this PR?
Breaking change:
fn row_counts(&self, column: &Column) -> Option<ArrayRef>becomesfn row_counts(&self) -> Option<ArrayRef>.columnparameter from trait definition and all 11 implementationsRowGroupPruningStatistics: readnum_rows()directly from row group metadata instead of routing throughStatisticsConverterPrunableStatistics: remove column-exists validation (row count is container-level)Are these changes tested?
Yes — all existing tests updated and passing. The behavior change is:
row_counts()onPrunableStatisticsnow returns data even for non-existent columns (correct, since row count is container-level)RowGroupPruningStatistics::row_counts()always returns row counts (previously could fail if column wasn't in Parquet schema)Are there any user-facing changes?
Yes — breaking change to
PruningStatisticstrait. Downstream implementations need to remove thecolumnparameter from theirrow_countsmethod.🤖 Generated with Claude Code