[Integration] Expose length-aware batching in all ModelHandler subclasses by Eliaaazzz · Pull Request #37945 · apache/beam

Eliaaazzz · 2026-03-25T03:49:59Z

Summary

Addresses #37531.

This PR completes the smart bucketing integration for Python RunInference by exposing batch_length_fn and batch_bucket_boundaries on all concrete ModelHandler implementations.

The underlying batching support already exists in the base layer. The missing piece was that many user-facing handlers did not surface these options, which made length-aware batching effectively unavailable for a large part of the inference API surface. With this change, users can enable smart bucketing directly from the handler constructor across supported backends.

What Changed

This change adds batch_length_fn and batch_bucket_boundaries to 16 concrete handlers across the following backends:

PyTorch
HuggingFace
scikit-learn
TensorFlow
ONNX
XGBoost
TensorRT
vLLM
Vertex AI
Gemini

Implementation details:

Handlers that inherit from ModelHandler now pass the new parameters through to super().__init__()
Remote handlers that manage batching kwargs directly (GeminiModelHandler and VertexAIModelHandlerJSON) now wire the values into _batching_kwargs

Testing

Added test coverage in base_test.py for both behavior and wiring:

an end-to-end RunInferenceLengthAwareBatchingTest that verifies short and long string inputs are bucketed into separate batches under FnApiRunner
a HandlerBucketingKwargsForwardingTest that checks each concrete handler forwards batch_length_fn and batch_bucket_boundaries into batch_elements_kwargs()
follow-up fixes to keep the forwarding tests hermetic, especially for HuggingFace pipeline validation and Vertex AI endpoint liveness checks

Context

This is the final integration piece for smart bucketing:

Together, these changes make length-aware batching usable through the public Python inference handlers rather than only at the base implementation layer.

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Integration] Expose length-aware batching in all ModelHandler subclasses#37945

[Integration] Expose length-aware batching in all ModelHandler subclasses#37945
damccorm merged 4 commits into
apache:masterfrom
Eliaaazzz:users/elia/issue-37531-smart-bucketing-integration

Eliaaazzz commented Mar 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Eliaaazzz commented Mar 25, 2026

Summary

What Changed

Testing

Context

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants