Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
490 commits
Select commit Hold shift + click to select a range
d077395
Remove redundant HTTP->HTTPS redirect in .htaccess (#38522)
liferoad May 27, 2026
d165840
Refactor _SharedCache to handle context vs non-context ownership (#38…
shunping May 27, 2026
bc4df6b
Enforce binary-only constraints during early requirements cache attem…
shunping May 28, 2026
5c565ee
Bump docker/setup-qemu-action from 4.0.0 to 4.1.0 (#38718)
dependabot[bot] May 28, 2026
e2d9f32
Bump github.com/aws/aws-sdk-go-v2/service/s3 in /sdks (#38719)
dependabot[bot] May 28, 2026
8d8ee69
Bump google.golang.org/api from 0.281.0 to 0.282.0 in /sdks (#38720)
dependabot[bot] May 28, 2026
dddd0f3
Bump github.com/aws/smithy-go from 1.25.1 to 1.26.0 in /sdks (#38721)
dependabot[bot] May 28, 2026
caca576
Bump github.com/aws/aws-sdk-go-v2/feature/s3/manager in /sdks (#38722)
dependabot[bot] May 28, 2026
4f0583c
Implemented MLTransform generate vocab Dataflow benchmark (#38215)
aIbrahiim May 28, 2026
6dcd79b
[#38708] Add Flink 2.0 to SDK pipeline options validation lists (#38726)
durgaprasadml May 28, 2026
3512bba
up timeout (#38732)
derrickaw May 28, 2026
0096c54
remove playground annotations from BatchElementsExample until next re…
aIbrahiim May 28, 2026
192f2ba
Update CHANGES.md after release cut (#38608)
Abacn May 28, 2026
72c5d7f
Support infer types involving dataclass fields (#38548)
Abacn May 28, 2026
17363b5
[Prism] Fix race condition in element manager (#38734)
shunping May 28, 2026
027964c
Add staged package hashes (#38311)
tarun-google May 28, 2026
5ac4ba6
Updated All Externally Visible Logs/Docs from Runner V2 to Portable R…
TongruiLi May 29, 2026
b017083
Bump github.com/aws/aws-sdk-go-v2 from 1.41.7 to 1.41.8 in /sdks (#38…
dependabot[bot] May 29, 2026
dfedf56
Bump github.com/aws/aws-sdk-go-v2/credentials in /sdks (#38737)
dependabot[bot] May 29, 2026
405f574
fix race condition in sdb-operator (#38698)
aIbrahiim May 29, 2026
21fa423
Disable build isolation for pip by default. (#38700)
shunping May 29, 2026
47d0447
Bump github.com/aws/aws-sdk-go-v2/service/s3 in /sdks (#38741)
dependabot[bot] May 29, 2026
db8232c
Bump github.com/aws/aws-sdk-go-v2/config in /sdks (#38738)
dependabot[bot] May 29, 2026
b52960d
Bump setup gradle to allowlisted v6.1.0 (#38745)
aIbrahiim May 29, 2026
70a3bbc
Bump github.com/aws/aws-sdk-go-v2/feature/s3/manager in /sdks (#38739)
dependabot[bot] May 29, 2026
ecab429
Implement MLTransform One-Hot Encoding benchmark pipeline (#38404)
aIbrahiim May 29, 2026
ba0549d
[yaml] - mongodb write normalization (#38376)
derrickaw May 29, 2026
cfea98a
Disable gRPC fork support in PortableRunnerTestWithSubprocesses (#38744)
shunping May 29, 2026
41bfbf7
Fix flaky TestDataSampler/GetSamplesForPCollectionsTooManySamples (#3…
shunping May 29, 2026
28178cd
Re-enable call-args check in yaml_io.py (#38730)
jrmccluskey May 29, 2026
2a658f3
Fix Reshuffle handling in Prism to recursively remove nested sub-tran…
shunping May 29, 2026
857bb10
Hugging face model handler #3 (#38696)
derrickaw May 30, 2026
8c74d84
Handle empty batches in Python worker counters (#38748)
goutamadwant May 30, 2026
5730975
feat(ml): add qdrant ingestion (#38142)
MichaelGruschke May 31, 2026
40c2d7b
[Dataflow Streaming] Enable state tag encoding v2 (#38705)
arunpandianp Jun 1, 2026
00c4188
Bump nanasess/setup-chromedriver from 2 to 3 (#38757)
dependabot[bot] Jun 1, 2026
66589ec
Bump github.com/aws/aws-sdk-go-v2/service/s3 in /sdks (#38758)
dependabot[bot] Jun 1, 2026
b29d7ec
Bump github.com/golang-cz/devslog from 0.0.15 to 0.0.16 in /sdks (#38…
dependabot[bot] Jun 1, 2026
7040646
Bump github.com/aws/aws-sdk-go-v2 from 1.41.8 to 1.41.9 in /sdks (#38…
dependabot[bot] Jun 1, 2026
d9f603a
Bump github.com/tetratelabs/wazero from 1.11.0 to 1.12.0 in /sdks (#3…
dependabot[bot] Jun 1, 2026
2e8717e
Bump github.com/aws/aws-sdk-go-v2/feature/s3/manager in /sdks (#38760)
dependabot[bot] Jun 1, 2026
3a473c3
Add Worker hashes for Golang (#38754)
tarun-google Jun 1, 2026
550bcf7
Updates Beam master container tags for Dataflow (#38766)
chamikaramj Jun 2, 2026
627b27f
[Dataflow Streaming] Prepare BoundedQueueExecutor for MultiKey bundle…
arunpandianp Jun 2, 2026
e2cb990
Fix truncating file handle (#38425)
shunping Jun 2, 2026
a3be67f
js2py to quickjs (#38473)
derrickaw Jun 2, 2026
fd7af76
Migrate to Google Cloud Dataflow Client (#37639)
jrmccluskey Jun 2, 2026
a1a1c6f
Adding release-2.74.0-postrelease to protected branches in .asf.yaml
Jun 2, 2026
389bbaf
Update managed-io.md for release 2.74.0-RC1. (#38527)
Abacn Jun 2, 2026
994011b
Update Beam website to release 2.74.0 (#38536)
Amar3tto Jun 2, 2026
00b55c1
Update Python Dependencies (#38775)
github-actions[bot] Jun 2, 2026
4d5bd2c
Fix internal test configuration for Dataflow client (#38792)
jrmccluskey Jun 3, 2026
2b52311
Implements the Delta Lake source with support for splitting (#38706)
chamikaramj Jun 3, 2026
842dcaf
Fix offlineRepositoryRoot relative to Beam repo root (#38790)
Abacn Jun 3, 2026
efa09a5
Update republish released docker workflow defaults to 2.74.0 (#38796)
aIbrahiim Jun 3, 2026
0d2046f
Update beam-master container (#38795)
jrmccluskey Jun 3, 2026
1bd70ca
Bump github.com/aws/aws-sdk-go-v2/service/s3 in /sdks (#38801)
dependabot[bot] Jun 4, 2026
07bd53c
Bump github.com/aws/aws-sdk-go-v2/config in /sdks (#38804)
dependabot[bot] Jun 4, 2026
29606e4
[OpenTelemetry] FNHarness, Dataflow Runner v1 - set open telemetry se…
stankiewicz Jun 4, 2026
a5c75ff
[OpenTelemetry] Default open telemetry configuration behind experimen…
stankiewicz Jun 4, 2026
8cc3668
Bump google.golang.org/api from 0.282.0 to 0.283.0 in /sdks (#38809)
dependabot[bot] Jun 4, 2026
2cfd465
Bump github.com/aws/aws-sdk-go-v2/feature/s3/manager in /sdks (#38810)
dependabot[bot] Jun 4, 2026
8916ff0
Bump github.com/nats-io/nats-server/v2 from 2.14.1 to 2.14.2 in /sdks…
dependabot[bot] Jun 4, 2026
daf252c
Upgrade Google Cloud libraries-bom to 26.83.0 (#38799)
aIbrahiim Jun 4, 2026
91cbf37
feat: Add LogElements transform to Java SDK (#38533)
lalitx17 Jun 4, 2026
0ca91be
Move google-auth lower bounds to >= 2.0.0 (#38778)
jrmccluskey Jun 4, 2026
30af259
Normalize types in dataclass field type resolving (#38797)
Abacn Jun 4, 2026
e5d0b0f
[Python] Honor disableCounterMetrics, disableStringSetMetrics, and di…
Anuragp22 Jun 4, 2026
6faf623
[Dataflow] Fix thread safety of HotKey logger (#38816)
arunpandianp Jun 5, 2026
13df084
Fix BigQuery Storage Write API stream count for bounded writes (#38776)
Akshatsharma2205 Jun 5, 2026
e11dbe1
Bump github.com/aws/aws-sdk-go-v2/feature/s3/manager in /sdks (#38825)
dependabot[bot] Jun 5, 2026
9e651e9
Bump cloud.google.com/go/storage from 1.62.2 to 1.62.3 in /sdks (#38826)
dependabot[bot] Jun 5, 2026
d947b4f
[OpenTelemetry] Add gcp otel auth extension (#38773)
stankiewicz Jun 5, 2026
e87013e
Revert "remove playground annotations from BatchElementsExample until…
aIbrahiim Jun 5, 2026
bf59744
Automate OpenTelemetry license entries in bomupgrader (#38815)
aIbrahiim Jun 5, 2026
b30fcb3
support generics in from row and to row conversions (#37347)
tilgalas Jun 8, 2026
a7674e4
[Dataflow Streaming] [Multi Key] Introduce KeyGroupWorkQueue and inte…
arunpandianp Jun 8, 2026
e1320d3
Update SpannerIO.java - comment documentation (#38492)
googledrew Jun 8, 2026
d3dece4
Bump github.com/aws/smithy-go from 1.27.1 to 1.27.2 in /sdks (#38841)
dependabot[bot] Jun 8, 2026
58a030e
[Experimental] Fix Yaml Xlang Test Timeout (#38798)
jrmccluskey Jun 8, 2026
450d72e
Fix parsing of dataflow api endpoint URLs to remove trailing slash (#…
jrmccluskey Jun 8, 2026
089ba44
Update Gemini text classification to gemini-2.5-flash (#38839)
aIbrahiim Jun 8, 2026
4581922
fix flaky Arm postcommit test collection on Python 3.14 (#38844)
aIbrahiim Jun 8, 2026
86ce68d
[Dataflow Streaming] Fix nullness supression in StreamingModeExecutio…
arunpandianp Jun 8, 2026
1cee5b4
Fix ensurepip bundled pip upgrade for Python 3.12+ (#38765)
aIbrahiim Jun 8, 2026
12f022d
Fix cloudML TFT install avoiding pip ResolutionTooDeep (#38781)
aIbrahiim Jun 8, 2026
d133f02
use local variable (#38850)
ahmedabu98 Jun 8, 2026
35de131
Fix Dataflow cost benchmark after Dataflow client migration (#38788)
aIbrahiim Jun 8, 2026
169498d
Bump github.com/aws/aws-sdk-go-v2/service/s3 in /sdks (#38856)
dependabot[bot] Jun 9, 2026
0622158
Bump golang.org/x/sys from 0.45.0 to 0.46.0 in /sdks (#38857)
dependabot[bot] Jun 9, 2026
874515c
Unpin google-cloud-bigtable version (fixes apache#37637) (#38861)
kellen Jun 9, 2026
cfd35d9
Bump distlib from 0.3.9 to 0.4.2 in /sdks/python (#38855)
dependabot[bot] Jun 9, 2026
9fa747c
Bump github.com/aws/aws-sdk-go-v2/credentials in /sdks (#38858)
dependabot[bot] Jun 9, 2026
77bda4d
Bump golang.org/x/text from 0.37.0 to 0.38.0 in /sdks (#38860)
dependabot[bot] Jun 9, 2026
529c559
Disable public ip for Dataflow Validates Runner test (#38424)
Abacn Jun 9, 2026
21050a5
increase timeout (#38864)
derrickaw Jun 9, 2026
67bf365
Adds a Github action for running Delta Lake I/O unit tests (#38869)
chamikaramj Jun 9, 2026
ca7a6f2
Enable code scanning alerts (#38817)
derrickaw Jun 9, 2026
efa266c
Side Input improvements (#38363)
reuvenlax Jun 9, 2026
dd45831
Support global sort and improve UDF overload resolution in Beam SQL (…
damccorm Jun 9, 2026
df72065
Update huggingface model handler info (#38789)
derrickaw Jun 9, 2026
b731448
Fix yaml doc generation (#38874)
Abacn Jun 9, 2026
f8240a5
Fix race condition in statesampler_fast.pyx (#38851)
derrickaw Jun 9, 2026
0f84764
Merge pull request #38878: Fix OnWindowExpirationContext.
reuvenlax Jun 9, 2026
da0a6a0
Bump github.com/aws/aws-sdk-go-v2/config in /sdks (#38884)
dependabot[bot] Jun 10, 2026
f4d7946
Bump google.golang.org/api from 0.283.0 to 0.284.0 in /sdks (#38887)
dependabot[bot] Jun 10, 2026
432e0f7
Bump github.com/aws/aws-sdk-go-v2/feature/s3/manager in /sdks (#38886)
dependabot[bot] Jun 10, 2026
338927c
Bump actions/checkout from 4 to 6 (#38885)
dependabot[bot] Jun 10, 2026
df32f87
update iceberg add files parameters (#38879)
derrickaw Jun 10, 2026
3f93241
add datadog io normalization and yaml (#38362)
derrickaw Jun 10, 2026
ed261ab
Bump torch (#38896)
dependabot[bot] Jun 10, 2026
df386d6
Revert "[Python] Honor disableCounterMetrics, disableStringSetMetrics…
Abacn Jun 10, 2026
991eb65
Bump torch (#38897)
dependabot[bot] Jun 10, 2026
539a369
Bump torch (#38903)
dependabot[bot] Jun 10, 2026
c7d3db8
Fix Xlang IO Direct installGcpTest uv cache lock (#38862)
aIbrahiim Jun 11, 2026
ad62036
Turn to manual build for Go for CodeQL (#38907)
derrickaw Jun 11, 2026
67038c6
Fix Dataflow legacy worker abort loop thread death issue (#38894)
damccorm Jun 11, 2026
76083e2
Bump cloud.google.com/go/bigtable from 1.48.0 to 1.49.0 in /sdks (#38…
dependabot[bot] Jun 11, 2026
ff80175
Bump github.com/aws/aws-sdk-go-v2/config in /sdks (#38916)
dependabot[bot] Jun 11, 2026
c14503c
Bump golang.org/x/net from 0.55.0 to 0.56.0 in /sdks (#38915)
dependabot[bot] Jun 11, 2026
5fde991
[IcebergIO] Improve TableCache (#38882)
ahmedabu98 Jun 11, 2026
036ed69
Bump github.com/aws/aws-sdk-go-v2/feature/s3/manager in /sdks (#38912)
dependabot[bot] Jun 11, 2026
19c166c
Use p310_ml_test (#38890)
aIbrahiim Jun 11, 2026
b6f55d4
Replace ClassLoadingStrategy with custom loading strategy (#38906)
cushon Jun 11, 2026
4baf0fc
Don't preempt snapshot runs (#38904)
damccorm Jun 11, 2026
2e94ead
Bump @grpc/grpc-js from 1.14.3 to 1.14.4 in /sdks/typescript (#38922)
dependabot[bot] Jun 11, 2026
9b27de6
Adds a new agent SKILL for developing new I/O connectors (#38910)
chamikaramj Jun 11, 2026
cd76624
Add Delta Lake source to the Java Managed API (#38902)
chamikaramj Jun 11, 2026
2da27cb
Implement Asynchronous wrapper for DoFn in Java SDK (#38609)
tejasiyer-dev Jun 11, 2026
eaa0042
Updates references to the I/O connector skill (#38923)
chamikaramj Jun 11, 2026
90ecb43
Updates CHANGES.md to mention the Delta Lake source (#38924)
chamikaramj Jun 11, 2026
360d06f
[SQL] Support positional parameters (#38880)
Abacn Jun 11, 2026
e391ce4
Support MAP type in RowJson and fix datetime parsing with spaces (#38…
damccorm Jun 11, 2026
a999b46
SQL DDL documentation (#37539)
ahmedabu98 Jun 11, 2026
2f5e8f8
Add DoFnRunner::finishKey() method (#38454)
arunpandianp Jun 12, 2026
bdd7a85
[Dataflow Streaming] Activate SourceState Finalizers before submittin…
arunpandianp Jun 12, 2026
0b361b8
Fix wordcount_rust requirements.txt and documentation (#38877)
jrmccluskey Jun 12, 2026
8ca1cf6
Exclude testSideInputNotReadyTimer from Spark batch suites (#38939)
aIbrahiim Jun 12, 2026
9c073e7
Add Gemini RunInference example notebook (#38943)
jrmccluskey Jun 12, 2026
c107070
update agent skills table (#38818)
derrickaw Jun 12, 2026
2232e5a
[Infra] Add beam_viewer and beam_writer roles for GSoC 2026 participa…
HansMarcus01 Jun 12, 2026
ca44d49
Add instrumentation for memory profiling in Python SDK (#38853)
tvalentyn Jun 12, 2026
986d1f2
[#38059] Fix GCS glob matching to support ** and / in object names (#…
hilaryRope Jun 12, 2026
619c063
SQL Database DDL & Usability Improvements (#38952)
damccorm Jun 12, 2026
b81fe12
Use add_experiment() instead of experiments.append() in Python
ash6898 Apr 18, 2026
0e3087d
Use ExperimentalOptions.addExperiment() instead of get/set pattern in…
ash6898 Apr 21, 2026
0084835
Fix upload_graph condition regression and formatting issues
ash6898 Apr 22, 2026
ae9b589
Copy experiments list to mutable ArrayList before addExperiment() calls
ash6898 Apr 27, 2026
45699f5
[mqtt] Add portable MqttIO Read/Write transforms for batch and stream…
tkaymak Jun 13, 2026
a3dc1a2
Upgrade expansion jar to java17 (#38931)
derrickaw Jun 13, 2026
0de5ec2
Spanner: Extend SpannerIO to support Spanner Omni mTLS setup (#38928)
sagnghos Jun 15, 2026
4a8bbbc
ClickHouseIO: Add DateTime64 support for sub-second timestamp precisi…
Eliaaazzz Jun 15, 2026
d9dcdaf
Revert "Upgrade expansion jar to java17 (#38931)" (#38965)
Abacn Jun 15, 2026
d2f7e82
Add support for STDDEV_POP and STDDEV_SAMP in VarianceFn (#38871)
damccorm Jun 15, 2026
4f0858a
Update BEAM_DEV_SDK_CONTAINER_TAG version (#38963)
damccorm Jun 15, 2026
f127d9b
Bump protobufjs from 8.4.0 to 8.6.0 in /sdks/typescript (#38968)
dependabot[bot] Jun 15, 2026
1023fd2
Bump form-data from 2.5.5 to 2.5.6 in /sdks/typescript (#38967)
dependabot[bot] Jun 15, 2026
b73bd6f
Use Flink 2.0 for load tests (#38279)
Abacn Jun 16, 2026
70e3759
Bump cloud.google.com/go/spanner from 1.91.0 to 1.92.0 in /sdks (#38976)
dependabot[bot] Jun 16, 2026
210d22d
Fix AsyncWrapperTest timeout and flakiness issues. (#38970)
tejasiyer-dev Jun 16, 2026
e2a741f
fix failure in opentelemetry-gcp-auth-extension jar (#38938)
aIbrahiim Jun 16, 2026
c3bee97
Bump jsonpickle upper bound. (#38769)
claudevdm Jun 16, 2026
2eaa88e
[Dataflow Java] Clarify which portions of DataflowWorkerLoggingOption…
scwhittle Jun 16, 2026
5a8efa4
Fix WriteToPubSub to pass ordering_key to publish() method (#37345)
nikitagrover19 Jun 17, 2026
780d68d
Add drain support for Dataflow and Flink (#38786)
lalitx17 Jun 17, 2026
999fa36
Bump google.golang.org/api from 0.284.0 to 0.285.0 in /sdks (#38997)
dependabot[bot] Jun 17, 2026
78c7815
Bump github.com/aws/aws-sdk-go-v2/service/s3 in /sdks (#38998)
dependabot[bot] Jun 17, 2026
4de175c
Bump github.com/aws/aws-sdk-go-v2/feature/s3/manager in /sdks (#38999)
dependabot[bot] Jun 17, 2026
3991c3a
fix sentence (#38989)
ahmedabu98 Jun 17, 2026
325e67d
address DOM text reinterpreted as HTML alert (#38949)
derrickaw Jun 17, 2026
a4a2028
Align gcpauth extension implementation with ability to turn it off ea…
stankiewicz Jun 8, 2026
3f82c20
Disable public IP for Dataflow Python and Go Validate runner tests (#…
Abacn Jun 17, 2026
2a00558
Add support for Apache Flink 2.1.3 (#38961)
ddebowczyk92 Jun 17, 2026
7d70f41
Make ModelManager import more robust (#38936)
AMOOOMA Jun 17, 2026
65f9558
[Beam SQL Shell] Add Iceberg deps when using icebergio (#38994)
ahmedabu98 Jun 17, 2026
d40bb15
Add publish_time_field to ReadFromPubSub YAML transform (#38985)
lalitx17 Jun 18, 2026
109a420
Bump cloud.google.com/go/bigtable from 1.49.0 to 1.50.0 in /sdks (#39…
dependabot[bot] Jun 18, 2026
bad0d3f
Exclude tests for batch-dataset (#39014)
Amar3tto Jun 18, 2026
46c6923
fix parsing error for typescript (#38898)
derrickaw Jun 18, 2026
0512134
Drop Flink 1.17,1.18 support (#39006)
Abacn Jun 18, 2026
04b493f
cancel user code with interruption after timeout (#39018)
scwhittle Jun 18, 2026
ced44f1
Fix Milvus ML PreCommit xdist, container startup (#39007)
aIbrahiim Jun 18, 2026
1c45c4b
Bump nodemailer from 8.0.5 to 9.0.1 in /scripts/ci/issue-report (#39019)
dependabot[bot] Jun 18, 2026
103cb18
remove shardedKey (#39024)
ahmedabu98 Jun 18, 2026
a56255c
Validate Project of staging bucket (#39008)
tarun-google Jun 18, 2026
1181e0c
Sanitize logging in FanOutStreamingEngineWorkerHarness (#39029)
parveensania Jun 19, 2026
7aa0722
Increase CallTest sleep durations to keep test from being flaky due t…
scwhittle Jun 19, 2026
17384c5
[Dataflow Java] Fix incorrect warning log when specifying logger over…
scwhittle Jun 19, 2026
284d6a0
Bump github.com/moby/moby/api from 1.54.2 to 1.55.0 in /sdks (#39031)
dependabot[bot] Jun 19, 2026
c751eee
[FnApi Java] Add support for separate named data streams to provide b…
scwhittle Jun 19, 2026
bbde342
Bump github.com/moby/moby/client from 0.4.1 to 0.5.0 in /sdks (#39032)
dependabot[bot] Jun 19, 2026
da4416c
[Iceberg AddFiles] Add NameMapping when creating table (#39022)
ahmedabu98 Jun 19, 2026
6335c70
Add Flink 2.2.1 runner support (#38980)
mananmangal Jun 20, 2026
e3e10c0
use proper table naming in iceberg sql (#39035)
ahmedabu98 Jun 20, 2026
b41fd6f
Roll forward io expansion service to Java17 (#38974)
derrickaw Jun 21, 2026
22c7dcb
Bump github.com/golang-cz/devslog from 0.0.16 to 0.0.17 in /sdks (#39…
dependabot[bot] Jun 22, 2026
e882d33
try to fix forward tests for java 17 (#39058)
derrickaw Jun 22, 2026
ba11d77
Update pandas version upper bound to support python 3.14 (#39056)
shunping Jun 22, 2026
5df98af
Fix playground precommit test failure (#39062)
shunping Jun 22, 2026
8430ac1
[yaml] - expand jinja method (#38547)
derrickaw Jun 22, 2026
4a6c433
Bump actions/checkout from 6 to 7 (#39033)
dependabot[bot] Jun 23, 2026
7f1d16e
Bump google.golang.org/api from 0.285.0 to 0.286.0 in /sdks (#39068)
dependabot[bot] Jun 23, 2026
d2d3dd6
Fix Flink 2.1+ dependency resolution (#39061)
Abacn Jun 23, 2026
7b86e25
Update Python Dependencies (#39070)
github-actions[bot] Jun 23, 2026
e7e51ff
Enable precommit yaml xlang test on code push. (#39071)
shunping Jun 23, 2026
623e09f
Release Note: Rename Dataflow Runner to Dataflow Portable Runner (#38…
TongruiLi Jun 23, 2026
44fd4a7
init (#39025)
claudevdm Jun 23, 2026
f688906
[mqtt] Add Python Xlang Messaging PostCommit with MQTT integration te…
tkaymak Jun 24, 2026
0eb094a
[ReduceFnRunner] Fix Prefetches (#39082)
arunpandianp Jun 24, 2026
5b8d530
[Dataflow Streaming] Increase stuck commit invalidation timeout to 1 …
arunpandianp Jun 24, 2026
69768ce
Bump actions/cache from 4 to 6 (#39081)
dependabot[bot] Jun 24, 2026
e02132d
Bump actions/checkout from 6 to 7 (#39080)
dependabot[bot] Jun 24, 2026
a47c2ff
Add UnboundedCountingSource::to for bounded reads from an UnboundedCo…
arunpandianp Jun 24, 2026
ce12a75
[Dataflow Java Streaming] Add a timeout to how long commits will retr…
scwhittle Jun 24, 2026
cb6c409
[mqtt] Fix streaming xlang IT failing the Messaging PostCommit (#39088)
tkaymak Jun 24, 2026
b19c3df
Add a github workflow to publish vllm image. (#39089)
shunping Jun 24, 2026
2397bb2
[Python] Optimize BigQuery copy jobs in file loads using multi-source…
stankiewicz Jun 24, 2026
ad7df11
[OpenTelemetry] Turn off gcp otel auth extension by default. (#38940)
stankiewicz Jun 24, 2026
75da7a4
Fix Flink XVR for Flink 2
Abacn Jun 23, 2026
190c64f
Fix Go Flink VR conf path
Abacn Jun 24, 2026
ab28067
Update CHANGES.md and website
Abacn Jun 24, 2026
bc25c5a
Disable flaky SDF finalization tests
arunpandianp Jun 24, 2026
9265c97
Fix Install xmllint
Amar3tto Jun 24, 2026
9f34821
Moving to 2.76.0-SNAPSHOT on master branch.
Jun 24, 2026
2861365
Update beam-master (#39095)
claudevdm Jun 24, 2026
213b623
[GSOC 2026] Adding logic to validate and generate an error if there a…
HansMarcus01 Jun 24, 2026
2e2091b
update changes log for yaml improvements etc (#39093)
derrickaw Jun 25, 2026
301833f
Update beam-master container
shunping Jun 25, 2026
ea687cb
Trigger failed postcommit tests.
shunping Jun 25, 2026
684ec15
Fix race condition in DirectRunner executor shutdown
shunping Jun 24, 2026
d182037
Apply suggested fix
shunping Jun 24, 2026
400c0bf
Fix JsonToRow swallowing downstream errors when runners fuse transfor…
utkarshparekh Jun 25, 2026
3a8a821
Enable playground tests triggered on cron schedule (#39069)
shunping Jun 25, 2026
f2ee01b
[Iceberg CDC] Import BaseIncrementalChangelogScan (#38823)
ahmedabu98 Jun 25, 2026
d957e89
[Iceberg CDC] Add utils base, delete reader, and serializable files (…
ahmedabu98 Jun 25, 2026
fddd83c
Fix pip ResolutionTooDeep (#39017)
aIbrahiim Jun 25, 2026
75ae1e6
[Python] Add UnboundedSource SDF wrapper (#19137) (#38724)
Eliaaazzz Jun 25, 2026
4483618
remove infra change (#39104)
derrickaw Jun 25, 2026
a6bfea1
Revert "[Python] Optimize BigQuery copy jobs in file loads using mult…
shunping Jun 25, 2026
4e417d6
Write append tables can be asynchronous (#39120)
stankiewicz Jun 26, 2026
0f016af
Fix typo (#39123)
shunping Jun 26, 2026
a617af5
Bump google-cloud-aiplatform (#39113)
dependabot[bot] Jun 26, 2026
15b4a7c
Update cython requirement from <4,>=3.2.5 to >=3.2.6,<4 in /sdks/pyth…
dependabot[bot] Jun 26, 2026
b01c1b7
[Website] Update Python version compatibility table (#38969)
jrmccluskey Jun 26, 2026
0da8aac
Docs: update Python version support in Python quickstart (#39122)
jayjayakumar Jun 26, 2026
54935c3
including GitHub Actions to schedule daily audits and Add documentati…
HansMarcus01 Jun 26, 2026
d757282
Fix a flaky test in ApproximateQuantilesTest (#39132)
shunping Jun 27, 2026
99ddb84
Fix flaky ML RunInference tests by disabling reshuffle on beam.Create…
shunping Jun 27, 2026
32dbe8d
Fix addExperiment() to handle immutable lists and remove redundant de…
ash6898 Jun 28, 2026
39a7a74
Restore upload_graph hasExperiment guard to prevent duplicate log mes…
ash6898 Jun 28, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
4 changes: 4 additions & 0 deletions .agent/skills/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,15 +25,19 @@ This directory contains skills that help the agent perform specialized tasks in

| Skill | Description |
|-------|-------------|
| [adding-new-metadata](adding-new-metadata/SKILL.md) | Guide on how to add and propagate new metadata fields in WindowedValue to avoid metadata loss |
| [beam-concepts](beam-concepts/SKILL.md) | Core Beam programming model (PCollections, PTransforms, windowing, triggers) |
| [beam-dofn-modernizer](beam-dofn-modernizer/SKILL.md) | Rewrite Apache Beam DoFn methods to remove legacy ProcessContext/OnTimerContext |
| [ci-cd](ci-cd/SKILL.md) | GitHub Actions workflows, debugging CI failures, triggering tests |
| [contributing](contributing/SKILL.md) | PR workflow, issue management, code review, release cycles |
| [gradle-build](gradle-build/SKILL.md) | Build commands, flags, publishing, troubleshooting |
| [io-connectors](io-connectors/SKILL.md) | 51+ I/O connectors, testing patterns, usage examples |
| [developing-new-io-connectors](developing-new-io-connectors/SKILL.md) | A detailed guide on developing new I/O connectors |
| [java-development](java-development/SKILL.md) | Java SDK development, building, testing, project structure |
| [license-compliance](license-compliance/SKILL.md) | Apache 2.0 license headers for all new files |
| [python-development](python-development/SKILL.md) | Python SDK environment setup, testing, building pipelines |
| [runners](runners/SKILL.md) | Direct, Dataflow, Flink, Spark runner configuration |
| [yaml-development](yaml-development/SKILL.md) | YAML SDK development, environment setup, testing, and key concepts |

## How Skills Work

Expand Down
3 changes: 1 addition & 2 deletions .agent/skills/adding-new-metadata/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,6 @@ You must ensure that when a DoFn processes an element and outputs a new element,
### Timers
If metadata needs to survive timer firings (e.g., knowing an `@OnTimer` fired because of a system drain), it must be added to Timer data structures. This is a bit of uncharted area which was only implemented for CausedByDrain metadata that comes from backend, not from persisted metadata. In order to persist all WindowedValue metadata across timer, more work has to be done, below are some pointers:
* `runners/core-java/src/main/java/org/apache/beam/runners/core/TimerInternals.java` and implementations (e.g., `WindmillTimerInternals.java` in Dataflow).
* `runners/samza/src/test/java/org/apache/beam/runners/samza/runtime/KeyedTimerData.java` (or generic `TimerData`).
* **Action:** Add the field to `TimerData`, next to `CausedByDrain`. Propagate it when setting the timer and expose it when the timer fires so it bubbles up.
* Eventually, metadata from Timer lands in WindowedValue, so it can be exposed to users. Keep field names, types, and getters similar to WindowedValue as much as possible, as common interface may be introduced eventually.

Expand Down Expand Up @@ -116,4 +115,4 @@ User needs to access the metadata in their `DoFn` (e.g., `@ProcessElement public
9. [ ] Update `ReduceFnRunner` and `OutputAndTimeBoundedSplittableProcessElementInvoker` for complex transform propagation.
10. [ ] If required by timers, update `TimerData` and `TimerInternals`.
11. [ ] If exposed to the user, update `DoFnSignatures` and `ByteBuddyDoFnInvokerFactory`.
12. [ ] Update other runners (Flink, Spark, Samza) to ensure they propagate the new `WindowedValue` fields correctly in their specific operators/runners.
12. [ ] Update other runners (Flink, Spark) to ensure they propagate the new `WindowedValue` fields correctly in their specific operators/runners.
347 changes: 347 additions & 0 deletions .agent/skills/developing-new-io-connectors/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,347 @@
---
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

name: developing-new-io-connectors
description: End-to-end guide on developing new Apache Beam I/O connectors correctly, including core IO transforms, SchemaTransforms, URN proto definitions, Managed API integration, cross-language expansion service, and testing.
---

# Developing New Apache Beam I/O Connectors

This guide outlines the modern best practices and mandatory steps for building new Apache Beam I/O connectors. Modern Beam connectors are expected to be schema-aware, available in cross-language (Python, Go) pipelines via the Expansion Service, and seamlessly integrable with Beam YAML and the `Managed` I/O API.

## 1. Module Structure and Gradle Setup

New Java I/O connectors should reside under `sdks/java/io/<connector-name>`.

### Directory Layout
```text
sdks/java/io/<connector-name>/
├── build.gradle
└── src/
├── main/java/org/apache/beam/sdk/io/<connector-name>/
│ ├── <Connector>IO.java
│ ├── <Connector>ReadSchemaTransformProvider.java
│ └── <Connector>WriteSchemaTransformProvider.java
└── test/java/org/apache/beam/sdk/io/<connector-name>/
├── <Connector>IOTest.java
└── <Connector>ReadSchemaTransformProviderTest.java
```

### Gradle Configuration (`build.gradle`)
Your `build.gradle` must use standard Beam Java module conventions and explicitly declare necessary dependencies.

```groovy
plugins { id 'org.apache.beam.module' }
applyJavaNature(
// If <connector-name> contains hyphens, convert them to dots or underscores
automaticModuleName: 'org.apache.beam.sdk.io.<connector_name>',
)

description = "Apache Beam :: SDKs :: Java :: IO :: <Connector Name>"
ext.summary = "Integration with <External System>."

dependencies {
implementation project(path: ":sdks:java:core", configuration: "shadow")
implementation project(path: ":model:pipeline", configuration: "shadow") // For URN definitions

// Add external client libraries here
implementation library.java.<client_dependency>

// Handle strict dependency checking if necessary
permitUnusedDeclared library.java.<client_dependency>

// Standard test dependencies
testImplementation project(path: ":sdks:java:core", configuration: "shadowTest")
testImplementation library.java.junit
}
```

---

## 2. Core I/O Transform Implementation (`<Connector>IO.java`)

Follow Beam's canonical AutoValue builder pattern for user-facing API configuration. While core Java I/O connectors can be strongly typed using specific domain classes or Java generics (`<T>`) for idiomatic Java SDK usage, modern sources should also emphasize Beam `Row` and Schema support (e.g., via `.readRows()`). For excellent real-world implementations of this pattern, refer to `IcebergIO` and `DeltaIO`.

### Bounded & Unbounded Sources
Instead of legacy `Source` classes, implement reading via Beam's **Splittable DoFns (SDF)** framework for advanced features such as dynamic rebalancing and watermark support.

A primary read transform (such as `read()` or `readRows()`) typically extends `PTransform<PBegin, PCollection<T>>` (or `PCollection<Row>`). Using `PCollection` as input is meant for "ReadAll" operations (such as reading a collection of file patterns or queries).

An example SDF-based read transform is given below:

```java
public class MyIO {
public static ReadRows readRows() {
return new AutoValue_MyIO_ReadRows.Builder().build();
}

@AutoValue
public abstract static class ReadRows extends PTransform<PBegin, PCollection<Row>> {
public abstract @Nullable String getConfigurationOption();
public abstract @Nullable Schema getSchema();
public abstract Builder toBuilder();

@AutoValue.Builder
public abstract static class Builder {
public abstract Builder setConfigurationOption(String value);
public abstract Builder setSchema(Schema schema);
public abstract ReadRows build();
}

public ReadRows withConfigurationOption(String value) {
return toBuilder().setConfigurationOption(value).build();
}

public ReadRows withSchema(Schema schema) {
return toBuilder().setSchema(schema).build();
}

@Override
public PCollection<Row> expand(PBegin input) {
return input
// `ReaderDoFn` is an SDF or source implementation that reads records and outputs `Row` objects.
.apply(ParDo.of(new ReaderDoFn(getConfigurationOption())))
.setRowSchema(getSchema());
}
}

public static WriteRows writeRows() {
return new AutoValue_MyIO_WriteRows.Builder().build();
}

@AutoValue
public abstract static class WriteRows extends PTransform<PCollection<Row>, PDone> {
public abstract @Nullable String getConfigurationOption();
public abstract Builder toBuilder();

@AutoValue.Builder
public abstract static class Builder {
public abstract Builder setConfigurationOption(String value);
public abstract WriteRows build();
}

public WriteRows withConfigurationOption(String value) {
return toBuilder().setConfigurationOption(value).build();
}

@Override
public PDone expand(PCollection<Row> input) {
input.apply("WriteRecords", ParDo.of(new WriterDoFn(getConfigurationOption())));
return PDone.in(input.getPipeline());
}
}
}
```
* Please make sure that any code you add adheres to the Beam coding standards. These standards are documented here: https://beam.apache.org/contribute/code-guidelines/

* Especially scrutinize any logic that involves splitting the data for parallel processing since it is a common source of errors that can lead to data loss or data duplication related issues.

### Developing Sinks (Write Transforms)
When implementing data egress (Write transforms), avoid creating single-worker bottlenecks. Depending on your target system's transactional requirements, prefer one of the following canonical Beam sink patterns:

1. **DoFn-Based / Batching Sinks:** For external APIs or messaging systems (e.g., Kafka, Pub/Sub, NoSQL databases), use a `DoFn` that manages connections per bundle (`@StartBundle`, `@FinishBundle`) or utilizes `GroupIntoBatches` to perform highly efficient, parallel batched requests.
2. **Two-Phase Commit / Exactly-Once Sinks:** For transactional sinks (e.g., Relational DBs, Apache Iceberg, Delta Lake), implement a multi-stage `PTransform`:
* **Write Shards:** Write records in parallel tasks to staging files or temporary transactions, emitting commit descriptors (`PCollection<CommitMessage>`).
* **Global Commit:** Group the commit messages and execute a single final transaction to commit all staged files/shards.
3. **File-Based Sinks:** If your connector purely writes files, leverage Beam's `FileIO` core infrastructure (`FileIO.write()` / `FileIO.Sink`) rather than implementing custom file rolling and sharding logic.
4. **Error Reporting (Dead-Letter Queues):** Instead of failing the entire pipeline when an invalid element or API error occurs, modern Write transforms should optionally output a `PCollection<Row>` (or custom `WriteResult` / `PCollectionRowTuple`) containing failed records and explicit error metadata.

* Specially scrutinize any logic that can create duplicate data due to worker failures. Assume that any transform in the pipeline can fail and be retried multiple times by the Beam runner. If the sink does not handle this properly, it can lead to duplicate data in the target system.
---

## 3. Extending the Pipeline Model Proto (`external_transforms.proto`)

To standardize your transform identifier across SDKs, define its URN in Beam's protobuf schema.

1. Open `model/pipeline/src/main/proto/org/apache/beam/model/pipeline/v1/external_transforms.proto`.
2. Add your read/write URNs under the appropriate enum (e.g., `ManagedTransforms.Urns`).

```protobuf
message ManagedTransforms {
enum Urns {
// ... existing entries
MY_SYSTEM_READ = 15 [(org.apache.beam.model.pipeline.v1.beam_urn) =
"beam:schematransform:org.apache.beam:my_system_read:v1"];
MY_SYSTEM_WRITE = 16 [(org.apache.beam.model.pipeline.v1.beam_urn) =
"beam:schematransform:org.apache.beam:my_system_write:v1"];
}
}
```

3. Re-generate and compile the model protos:
```bash
./gradlew :model:pipeline:generateProto :model:pipeline:compileJava
```

---

## 4. Implementing `SchemaTransformProvider`

To expose your connector to cross-language pipelines and Beam YAML, create a typed `SchemaTransformProvider`.

```java
@AutoService(SchemaTransformProvider.class)
public class MyReadSchemaTransformProvider extends TypedSchemaTransformProvider<Configuration> {

@Override
public String identifier() {
return getUrn(ExternalTransforms.ManagedTransforms.Urns.MY_SYSTEM_READ);
}

@Override
public String description() {
return "Reads records from My System and outputs a PCollection of Beam Rows.";
}

@Override
protected SchemaTransform from(Configuration configuration) {
return new MyReadSchemaTransform(configuration);
}

@Override
public List<String> outputCollectionNames() {
return Collections.singletonList("output");
}

@DefaultSchema(AutoValueSchema.class)
@AutoValue
public abstract static class Configuration {
@SchemaFieldDescription("Configuration option description.")
public abstract String getConfigurationOption();

public static Builder builder() {
return new AutoValue_MyReadSchemaTransformProvider_Configuration.Builder();
}

@AutoValue.Builder
public abstract static class Builder {
public abstract Builder setConfigurationOption(String value);
public abstract Configuration build();
}
}

static class MyReadSchemaTransform extends SchemaTransform {
private final Configuration configuration;

MyReadSchemaTransform(Configuration configuration) {
this.configuration = Objects.requireNonNull(configuration, "configuration cannot be null");
}

@Override
public PCollectionRowTuple expand(PCollectionRowTuple input) {
PCollection<Row> output = input.getPipeline().apply(
MyIO.readRows().withConfigurationOption(configuration.getConfigurationOption()));
return PCollectionRowTuple.of("output", output);
}
}
}
```

---

## 5. Integrating with Managed API (`Managed.java`)

Beam's `Managed` I/O transform provides a unified interface for data ingest/egress. To support it:

1. Open `sdks/java/managed/src/main/java/org/apache/beam/sdk/managed/Managed.java`.
2. Define a public constant identifier:
```java
public static final String MY_SYSTEM = "my_system";
```
3. Register your URNs in `READ_TRANSFORMS` or `WRITE_TRANSFORMS`:
```java
public static final Map<String, String> READ_TRANSFORMS =
ImmutableMap.<String, String>builder()
// ... existing transforms
.put(MY_SYSTEM, getUrn(ExternalTransforms.ManagedTransforms.Urns.MY_SYSTEM_READ))
.build();
```
4. Update the Javadoc block in `Managed.java` to list your new connector.

---

## 6. Expansion Service Registration

To enable non-Java SDKs (Python, Go) to discover and expand your new connector, include it in the standard Java Expansion Service.

1. Open `sdks/java/io/expansion-service/build.gradle`.
2. Add your module as a runtime dependency:
```groovy
dependencies {
// ... existing dependencies
runtimeOnly project(":sdks:java:io:<connector-name>")
}
```

---

## 7. Python & Beam YAML Integration

Once registered in the expansion service, your `SchemaTransform` can be utilized in Python and YAML. E.g., for `Managed` support in Python:

1. Open `sdks/python/apache_beam/transforms/managed.py`.
2. Export your identifier in `__all__` and map it to its URN in `Read._READ_TRANSFORMS` or `Write._WRITE_TRANSFORMS`:
```python
MY_SYSTEM = 'my_system'

__all__ = [
# ... existing
"MY_SYSTEM",
]

class Read(PTransform):
_READ_TRANSFORMS = {
# ... existing
MY_SYSTEM: ManagedTransforms.Urns.MY_SYSTEM_READ.urn,
}
```
3. In `sdks/python/apache_beam/transforms/external.py`, map the URN to the appropriate Expansion Service jar target in `MANAGED_TRANSFORM_URN_TO_JAR_TARGET_MAPPING`.

---

## 8. Verification and Testing

Verify your new connector thoroughly across multiple abstraction layers:

### 1. Unit Tests
Test your core builder and `SchemaTransformProvider` translation.
```bash
./gradlew :sdks:java:io:<connector-name>:test
```

### 2. Managed API Translation
In `ManagedSchemaTransformTranslationTest.java` (under `sdks/java/managed`), you can verify the translation structure of your managed transform. Note that `ManagedTest.java` generally uses dummy/test providers (`TestSchemaTransformProvider`) to keep dependencies lightweight.
```bash
./gradlew :sdks:java:managed:test
```

### 3. Integration Tests (IT)
Create integration tests to test end-to-end data processing against real system instances, including `Managed.read(Managed.MY_SYSTEM)` usage.

* Test resources should be managed via `ResourceManager` classes under the `it/` directory.
* Add GitHub Actions to trigger your tests when changes are made to your connector code. Consider adding one for pre-commit and one for post-commit.

### 4. Documentation
Add any necessary documentation for your connector under the `website/www/site/content/en/documentation/io/built-in/` directory.

---

> [!TIP]
> **Canonical Reference Implementations:** When developing a new connector, we highly recommend studying **Apache Iceberg** ([IcebergIO.java](https://github.com/apache/beam/blob/master/sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/IcebergIO.java)) and **Delta Lake** ([DeltaIO.java](https://github.com/apache/beam/blob/master/sdks/java/io/delta/src/main/java/org/apache/beam/sdk/io/delta/DeltaIO.java)) as state-of-the-art reference implementations.

For more details see the [Developing I/O connectors](https://beam.apache.org/documentation/io/developing-io-overview) guide.
3 changes: 2 additions & 1 deletion .agent/skills/io-connectors/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -189,9 +189,10 @@ Beam supports using I/O connectors from one SDK in another via the expansion ser
```

## Creating New Connectors
See [Developing I/O connectors](https://beam.apache.org/documentation/io/developing-io-overview)

Key components:
1. **Source** - Reads data (bounded or unbounded)
2. **Sink** - Writes data
3. **Read/Write transforms** - User-facing API

For more detailed information on developing new I/O connectors see the [Developing new I/O connectors SKILL](../developing-new-io-connectors/SKILL.md).
Loading
Loading