Roll added manifests at commit.manifest.target-size-bytes in fast-append#3304
Closed
rynewang wants to merge 1 commit intoapache:mainfrom
Closed
Roll added manifests at commit.manifest.target-size-bytes in fast-append#3304rynewang wants to merge 1 commit intoapache:mainfrom
rynewang wants to merge 1 commit intoapache:mainfrom
Conversation
_FastAppendFiles previously wrote every added data file into a single manifest, regardless of commit.manifest.target-size-bytes. Java's RollingManifestWriter honours this property on the fast-append path too; this change brings pyiceberg to parity. The first manifest is written inline until it reaches the target, which yields an exact entries-per-manifest count; remaining chunks are then submitted to the existing ExecutorFactory pool so the GIL-bound encode of chunk K overlaps with the compress/upload (both GIL-releasing) of earlier chunks. _write_delete_manifest and _existing_manifests are submitted first so they run concurrently with the inline first chunk. Small appends (under the target in one manifest) take the same code path and produce a single manifest as before. Multiple small manifests let query planners read them in parallel, prune whole manifests via the manifest-list partition bounds without opening them, and keep delete/overwrite rewrites bounded to the affected manifest instead of the full set.
geruh
requested changes
Apr 30, 2026
Member
geruh
left a comment
There was a problem hiding this comment.
Thanks for opening @rynewang, this is definitely a problem as pyiceberg needs manifest rolling support like the other sub-projects. However, there is an open pr that starts that work in #3009 that still needs review. The approach taken there is the ground work that mirrors the logic in Java's RollingManifestWriter. Once this lands, then we can hook it into our manifest writing logic.
Author
|
Thanks! Great to know it's already being handled. closing this PR. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
_FastAppendFilescurrently writes every added data file into a single manifest.commit.manifest.target-size-bytes(default 8 MiB) is already defined and honoured by_MergeAppendFileswhen bin-packing existing manifests, but the fast-append path never reads it — so a large append produces one arbitrarily large manifest. Java Iceberg'sRollingManifestWriterrolls over at the target on the fast-append path too; this brings pyiceberg to parity.Change
_SnapshotProducer._manifests()now:writer.tell()reachescommit.manifest.target-size-bytes, which yields an exact entries-per-manifest count for this append's entry shape._added_data_filesby that count and submits each chunk to the existingExecutorFactorythread pool. Encoding is GIL-bound, but block compression (zlib) and the output-file write both release the GIL, so one chunk's encode overlaps with earlier chunks' compress/upload._write_delete_manifestand_existing_manifeststo the pool before starting the inline first chunk, so they still run concurrently.Small appends (everything fits under the target) take the same code path and produce one manifest, same as before.
Why multiple manifests matter downstream
Testing
test_fast_append_rolls_added_manifests_at_target_size: creates a table withcommit.manifest.target-size-bytes=4096, appends 200 data files, asserts_manifests()returns more than one manifest, all entries accounted for, and each manifest is within a small multiple of the target.test_fast_append_single_manifest_when_under_target: 3 files → one manifest.tests/table/test_snapshots.pyandtests/utils/test_manifest.pypass unchanged.Relationship to #3303
Independent — this PR only touches
snapshot.py, #3303 only touchespyiceberg/avro/. The thread-pool overlap here helps with or without the Cython encoder (pure-Python encode still overlaps with compress/upload); with #3303 the encode floor drops and the overlap ratio improves.