Roll added manifests at commit.manifest.target-size-bytes in fast-append by rynewang · Pull Request #3304 · apache/iceberg-python

rynewang · 2026-04-30T21:47:35Z

Summary

_FastAppendFiles currently writes every added data file into a single manifest. commit.manifest.target-size-bytes (default 8 MiB) is already defined and honoured by _MergeAppendFiles when bin-packing existing manifests, but the fast-append path never reads it — so a large append produces one arbitrarily large manifest. Java Iceberg's RollingManifestWriter rolls over at the target on the fast-append path too; this brings pyiceberg to parity.

Change

_SnapshotProducer._manifests() now:

Writes the first added-manifest inline until writer.tell() reaches commit.manifest.target-size-bytes, which yields an exact entries-per-manifest count for this append's entry shape.
Chunks the remaining _added_data_files by that count and submits each chunk to the existing ExecutorFactory thread pool. Encoding is GIL-bound, but block compression (zlib) and the output-file write both release the GIL, so one chunk's encode overlaps with earlier chunks' compress/upload.
Submits _write_delete_manifest and _existing_manifests to the pool before starting the inline first chunk, so they still run concurrently.

Small appends (everything fits under the target) take the same code path and produce one manifest, same as before.

Why multiple manifests matter downstream

Planners (Trino/Athena/Spark) read manifests in parallel; one multi-GB manifest is a single-threaded decode bottleneck.
The manifest-list carries per-manifest partition bounds, so predicates can skip whole manifests without opening them — which is impossible with one monolith.
Delete/overwrite rewrites any manifest containing an affected file; with one large manifest, a single-row delete rewrites the whole thing.

Testing

test_fast_append_rolls_added_manifests_at_target_size: creates a table with commit.manifest.target-size-bytes=4096, appends 200 data files, asserts _manifests() returns more than one manifest, all entries accounted for, and each manifest is within a small multiple of the target.
test_fast_append_single_manifest_when_under_target: 3 files → one manifest.
Existing tests/table/test_snapshots.py and tests/utils/test_manifest.py pass unchanged.

Relationship to #3303

Independent — this PR only touches snapshot.py, #3303 only touches pyiceberg/avro/. The thread-pool overlap here helps with or without the Cython encoder (pure-Python encode still overlaps with compress/upload); with #3303 the encode floor drops and the overlap ratio improves.

_FastAppendFiles previously wrote every added data file into a single manifest, regardless of commit.manifest.target-size-bytes. Java's RollingManifestWriter honours this property on the fast-append path too; this change brings pyiceberg to parity. The first manifest is written inline until it reaches the target, which yields an exact entries-per-manifest count; remaining chunks are then submitted to the existing ExecutorFactory pool so the GIL-bound encode of chunk K overlaps with the compress/upload (both GIL-releasing) of earlier chunks. _write_delete_manifest and _existing_manifests are submitted first so they run concurrently with the inline first chunk. Small appends (under the target in one manifest) take the same code path and produce a single manifest as before. Multiple small manifests let query planners read them in parallel, prune whole manifests via the manifest-list partition bounds without opening them, and keep delete/overwrite rewrites bounded to the affected manifest instead of the full set.

geruh

Thanks for opening @rynewang, this is definitely a problem as pyiceberg needs manifest rolling support like the other sub-projects. However, there is an open pr that starts that work in #3009 that still needs review. The approach taken there is the ground work that mirrors the logic in Java's RollingManifestWriter. Once this lands, then we can hook it into our manifest writing logic.

rynewang · 2026-05-01T00:53:44Z

Thanks! Great to know it's already being handled. closing this PR.

geruh requested changes Apr 30, 2026

View reviewed changes

rynewang closed this May 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Roll added manifests at commit.manifest.target-size-bytes in fast-append#3304

Roll added manifests at commit.manifest.target-size-bytes in fast-append#3304
rynewang wants to merge 1 commit intoapache:mainfrom
rynewang:perf/rolling-manifest-fast-append

rynewang commented Apr 30, 2026

Uh oh!

geruh left a comment

Uh oh!

rynewang commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rynewang commented Apr 30, 2026

Summary

Change

Why multiple manifests matter downstream

Testing

Relationship to #3303

Uh oh!

geruh left a comment

Choose a reason for hiding this comment

Uh oh!

rynewang commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants