Skip to content

refactor(pyamber): share receiver-batch construction in partitioners#5989

Draft
Ma77Ball wants to merge 2 commits into
apache:mainfrom
Ma77Ball:refactor/partitioner-batch-helpers
Draft

refactor(pyamber): share receiver-batch construction in partitioners#5989
Ma77Ball wants to merge 2 commits into
apache:mainfrom
Ma77Ball:refactor/partitioner-batch-helpers

Conversation

@Ma77Ball

Copy link
Copy Markdown
Contributor

What changes were proposed in this PR?

  • Added Partitioner.build_receiver_batches(channels) to the base class, the single home for the ordered (receiver, batch) list built via dict.fromkeys (which preserves channel order where a set literal would not).
  • Replaced the three identical inline copies of that construction in RoundRobinPartitioner, HashBasedShufflePartitioner, and RangeBasedShufflePartitioner with a call to the shared helper, dropping the drifted copy-pasted comments.
  • No behavior change: each partitioner still builds the same ordered, deduplicated receiver list.

Any related issues, documentation, discussions?

Closes: #5988

How was this PR tested?

  • Run cd amber && PYTHONPATH=src/main/python python -m pytest src/test/python/core/architecture/sendsemantics/test_partitioners.py -q, expect all 35 tests to pass.
  • The order/dedup behavior owned by the new helper is covered by the existing TestRoundRobinPartitioner::test_init_preserves_channel_order and test_init_dedupes_duplicate_channels_preserving_first_seen_order; confirm they still pass against the refactored construction.

Was this PR authored or co-authored using generative AI tooling?

Co-authored with Claude Opus 4.8 in compliance with ASF

@github-actions

Copy link
Copy Markdown
Contributor

Automated Reviewer Suggestions

Based on the git blame history of the changed files, we recommend the following reviewers:

  • Contributors with relevant context: @Yicong-Huang
    You can notify them by mentioning @Yicong-Huang in a comment.

@codecov-commenter

codecov-commenter commented Jun 28, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 56.63%. Comparing base (24b587f) to head (47d49e4).
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #5989      +/-   ##
============================================
- Coverage     56.89%   56.63%   -0.27%     
+ Complexity     3058     3040      -18     
============================================
  Files          1129     1129              
  Lines         43802    43805       +3     
  Branches       4743     4743              
============================================
- Hits          24922    24807     -115     
- Misses        17449    17548      +99     
- Partials       1431     1450      +19     
Flag Coverage Δ *Carryforward flag
access-control-service 70.00% <ø> (ø) Carriedforward from 3d7486b
agent-service 44.59% <ø> (ø) Carriedforward from 3d7486b
amber 58.00% <ø> (-0.66%) ⬇️ Carriedforward from 3d7486b
computing-unit-managing-service 0.00% <ø> (ø) Carriedforward from 3d7486b
config-service 52.30% <ø> (ø) Carriedforward from 3d7486b
file-service 62.81% <ø> (ø) Carriedforward from 3d7486b
frontend 50.09% <ø> (-0.04%) ⬇️ Carriedforward from 3d7486b
notebook-migration-service 78.57% <ø> (ø) Carriedforward from 3d7486b
pyamber 91.15% <100.00%> (+<0.01%) ⬆️
python 90.68% <ø> (-0.01%) ⬇️ Carriedforward from 3d7486b
workflow-compiling-service 55.14% <ø> (ø) Carriedforward from 3d7486b

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions

github-actions Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

⚠️ Benchmark changes need a look

🟢 2 better · 🔴 5 worse · ⚪ 8 noise (<±5%) · 0 without baseline

Compared against main 24b587f benchmarked on this same runner, so the delta is largely free of cross-runner hardware noise. The "7d avg" column still reflects the gh-pages dashboard. Treat <±5% as noise unless repeated.

Dashboard · Run

config throughput MB/s latency max Δ latest / 7d
🔴 bs=10 sw=10 sl=64 362 0.221 27,805/34,012/34,012 us 🔴 +20.4% / 🔴 +122.5%
🟢 bs=100 sw=10 sl=64 922 0.563 108,233/124,803/124,803 us 🟢 -6.3% / 🔴 +15.2%
bs=1000 sw=10 sl=64 1,092 0.667 914,027/956,549/956,549 us ⚪ within ±5% / 🟢 -10.5%
Baseline details

Latest main 24b587f from same runner

config metric PR latest main 7d avg Δ latest Δ 7d
bs=10 sw=10 sl=64 throughput 362 tuples/sec 435 tuples/sec 770.95 tuples/sec -16.8% -53.0%
bs=10 sw=10 sl=64 MB/s 0.221 MB/s 0.266 MB/s 0.471 MB/s -16.9% -53.0%
bs=10 sw=10 sl=64 p50 27,805 us 23,090 us 12,775 us +20.4% +117.7%
bs=10 sw=10 sl=64 p95 34,012 us 31,655 us 15,286 us +7.4% +122.5%
bs=10 sw=10 sl=64 p99 34,012 us 31,655 us 18,795 us +7.4% +81.0%
bs=100 sw=10 sl=64 throughput 922 tuples/sec 934 tuples/sec 976.93 tuples/sec -1.3% -5.6%
bs=100 sw=10 sl=64 MB/s 0.563 MB/s 0.57 MB/s 0.596 MB/s -1.2% -5.6%
bs=100 sw=10 sl=64 p50 108,233 us 103,288 us 102,557 us +4.8% +5.5%
bs=100 sw=10 sl=64 p95 124,803 us 133,170 us 108,383 us -6.3% +15.2%
bs=100 sw=10 sl=64 p99 124,803 us 133,170 us 115,249 us -6.3% +8.3%
bs=1000 sw=10 sl=64 throughput 1,092 tuples/sec 1,114 tuples/sec 1,009 tuples/sec -2.0% +8.2%
bs=1000 sw=10 sl=64 MB/s 0.667 MB/s 0.68 MB/s 0.616 MB/s -1.9% +8.3%
bs=1000 sw=10 sl=64 p50 914,027 us 899,979 us 997,695 us +1.6% -8.4%
bs=1000 sw=10 sl=64 p95 956,549 us 943,196 us 1,036,731 us +1.4% -7.7%
bs=1000 sw=10 sl=64 p99 956,549 us 943,196 us 1,069,334 us +1.4% -10.5%
Raw CSV
config_idx,batch_size,schema_width,string_len,num_batches,total_ms,total_tuples,total_bytes,tuples_per_sec,mb_per_sec,lat_p50_us,lat_p95_us,lat_p99_us
0,10,10,64,20,552.39,200,128000,362,0.221,27805.38,34012.31,34012.31
1,100,10,64,20,2169.00,2000,1280000,922,0.563,108233.06,124803.36,124803.36
2,1000,10,64,20,18308.77,20000,12800000,1092,0.667,914027.29,956548.87,956548.87

@Ma77Ball Ma77Ball marked this pull request as draft June 28, 2026 21:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pyamber refactor Refactor the code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Deduplicate receiver-batch construction across shuffle partitioners

2 participants