[python][io] Add native GenerateSequence bounded PTransform (#18088)#37675
[python][io] Add native GenerateSequence bounded PTransform (#18088)#37675mtauha wants to merge 4 commits intoapache:masterfrom
Conversation
This adds a native Python implementation of GenerateSequence, equivalent to the Java SDK's GenerateSequence/CountingSource. The transform generates a bounded sequence of integers from start (inclusive) to stop (exclusive). Key features: - BoundedSource implementation with efficient splitting support - OffsetRangeTracker for dynamic work rebalancing - VarIntCoder for efficient integer encoding - DisplayData support for pipeline visualization The unbounded mode is not yet implemented and will raise NotImplementedError if stop is not specified. Closes apache#18088
Adds comprehensive tests for the native Python GenerateSequence transform: - Basic bounded sequence generation - Edge cases (empty range, single element, start > 0) - Invalid input validation (negative start, stop < start) - Large sequence handling - Unbounded mode error handling - BoundedSource API tests (split, estimate_size, range_tracker, read) - DisplayData tests Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Document the new native Python GenerateSequence transform in the CHANGES.md file under New Features / Improvements for version 2.73.0. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Summary of ChangesHello @mtauha, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the Python SDK by introducing a native Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
|
assign set of reviewers |
|
Assigning reviewers: R: @shunping for label python. Note: If you would like to opt out of this review, comment Available commands:
The PR bot will only process comments in the main thread (not review comments). |
|
remind me after tests pass |
|
Ok - I'll remind @mtauha after tests pass |
|
Hey @damccorm @robertwb @claudevdm @kennknowles @shunping! I'm a GSoC 2026 applicant and this is one of my first contributions to Apache Beam as part of my prep. I've implemented the bounded GenerateSequence PTransform in Python (Phase 1 of #18088), modelled after Java's CountingSource. Looks like some checks might be failing and I'm honestly not sure why. Would really appreciate it if you could take a look and guide me in the right direction! Any review feedback is super welcome too. Thanks a lot for your time! |
|
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #37675 +/- ##
=============================================
+ Coverage 40.08% 57.03% +16.94%
Complexity 3416 3416
=============================================
Files 1178 1179 +1
Lines 187411 187580 +169
Branches 3588 3588
=============================================
+ Hits 75120 106981 +31861
+ Misses 108901 77209 -31692
Partials 3390 3390
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Description
This PR adds a native Python
GenerateSequencebounded PTransform to thePython SDK, equivalent to the Java SDK's
GenerateSequence(formerly knownas
CountingInput).Addresses #18088
Motivation
The Python SDK previously had no native equivalent of Java's
GenerateSequence/
CountingInputtransform. The only existing Python implementation(
apache_beam/io/external/generate_sequence.py) requires a Java expansionservice and only works with the Flink runner, making it inaccessible to most
Python users.
This PR introduces a pure Python implementation that works on all runners
(DirectRunner, Dataflow, etc.) without any Java dependency.
Changes
sdks/python/apache_beam/io/generate_sequence.py:GenerateSequence— aPTransformthat produces a bounded sequenceof integers from
start(inclusive) tostop(exclusive)_BoundedCountingSource— aBoundedSourcebacked byOffsetRangeTracker, supporting efficient splitting and dynamicwork rebalancing across workers
sdks/python/apache_beam/io/generate_sequence_test.pywith unittests covering basic usage, edge cases, splitting behaviour, and
size estimation
Notes
rate limiting will follow in a separate PR.
apache_beam/io/external/generate_sequence.pyis untouched.CountingSource.javaandfollows the same
BoundedSource+OffsetRangeTrackerpattern usedby other Python SDK IO sources.
Testing
cd sdks/python python -m pytest apache_beam/io/generate_sequence_test.py -v