feat(arrow-row): add MSD radix sort kernel for row-encoded keys#9683
feat(arrow-row): add MSD radix sort kernel for row-encoded keys#9683mbutrovich wants to merge 9 commits intoapache:mainfrom
Conversation
|
Full
|
|
Thanks for the feedback @Dandandan! I tried the 4 optimizations you suggested, and here are the results:
Claude's attempt at interpreting the results:
Do you want the commit anyway just to iterate off of? |
|
I updated numbers in the PR description and full table in the comment after it to reflect current PR status. My attempt to bring it to DataFusion has so far been disappointing: apache/datafusion#21525 I think we need a deeper redesign of that stream rather than just replacing the existing logic. apache/datafusion#21525 (comment) |
|
@Dandandan I also made a couple of API additions to |
Which issue does this PR close?
N/A.
Rationale for this change
The existing
lexsort_to_indicesuses comparison sort on columnar arrays, which is O(n log n × comparison_cost) where comparison cost scales with the number of columns. The Arrow row format (RowConverter) produces memcmp-comparable byte sequences, making it a natural fit for radix sort — O(n × key_width) — which can overcome the encoding overhead by eliminating per-comparison column traversal.Inspired by DuckDB's sorting redesign, which uses MSD radix sort on normalized key prefixes with a comparison sort fallback, this PR adds an
radix_sort_to_indiceskernel that operates directly on row-encoded keys. Like DuckDB, we limit radix depth to 8 bytes before falling back to comparison sort, balancing radix efficiency against diminishing returns on deep recursion.What changes are included in this PR?
arrow-row/src/radix.rs(new): MSD radix sort onRowswith:byte_posonward viaRow::data_from(), skipping the shared prefixarrow-row/src/lib.rs: Exposespub mod radix, addsRow::byte_from()andRow::data_from()APIsarrow/benches/lexsort.rs: Addslexsort_radixbenchmark variant alongside existinglexsort_to_indicesandlexsort_rowsBenchmark results
All three variants include the full pipeline (encoding + sort) so the comparison against
lexsort_to_indices(which doesn't encode) is apples-to-apples.Radix sort is the fastest in the majority of cases. The main exception is pure low-cardinality dictionary columns where
lexsort_to_indicesavoids encoding overhead entirely. The module documentation provides guidance on when to use each approach.Are these changes tested?
17 tests including:
RowsAre there any user-facing changes?
New public APIs:
arrow_row::radix::radix_sort_to_indices(&Rows) -> Vec<u32>— sort with default parametersarrow_row::radix::radix_sort_to_indices_with(&Rows, max_depth, fallback_threshold) -> Vec<u32>— sort with tunable radix depth and fallback thresholdRow::byte_from(offset) -> u8— single byte access at offsetRow::data_from(offset) -> &[u8]— suffix slice from offset