fix: dynamic_partition_overwrite builds per-spec delete predicates after partition spec evolution#3149
Conversation
…s after partition spec evolution Fixes apache#3148 When a table has undergone partition spec evolution, its snapshot may contain manifests written under different partition_spec_ids. Previously, dynamic_partition_overwrite built the delete predicate using only the current spec, causing the manifest evaluator to incorrectly skip manifests from older specs — leaving stale data files silently behind. The fix builds the delete predicate per historical spec present in the snapshot, projecting the new data files' partition values into each spec's coordinate space before evaluating. Regression tests added covering: - Mixed-spec snapshot (manifests from both spec-0 and spec-1) - Overwrite of a partition that only exists in spec-0 manifests (silent data duplication case)
|
AI Disclosure |
| # only the fields that spec knows about, matched against the | ||
| # corresponding positions in the new data files' partition records. | ||
| snapshot = self.table_metadata.snapshot_by_name(branch or MAIN_BRANCH) | ||
| if snapshot is not None: |
There was a problem hiding this comment.
I think this logic is a bit wonky, if the branch doesn't exists then we silently fall back to the main branch.
| # corresponding positions in the new data files' partition records. | ||
| snapshot = self.table_metadata.snapshot_by_name(branch or MAIN_BRANCH) | ||
| if snapshot is not None: | ||
| spec_ids_in_snapshot = {m.partition_spec_id for m in snapshot.manifests(io=self._table.io)} |
There was a problem hiding this comment.
Do we need to collect these upfront? This is pretty expensive as it will do IO to pull all the manifests
| Expected after overwrite: | ||
| - Only new A rows: values [888, 999] | ||
| - All B rows untouched: values [10, 11, 200] | ||
| - Total: 5 rows | ||
|
|
||
| Bug (pre-fix): spec-0 A manifests are skipped by the evaluator, | ||
| leaving stale A rows (1, 2, 3) in the table -> 8 rows total. |
There was a problem hiding this comment.
Hmm. I'm not sure if we can make this change. Existing users might be used to the existing behavior. When we change this, it will drop all many more rows than before.
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that's incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions. |
Rationale
While reviewing PR #3011 (manifest pruning optimization), I identified a correctness
gap when tables have undergone partition spec evolution.
When
dynamic_partition_overwriteis called on a table with mixedpartition_spec_idsin its snapshot, the delete predicate was built using only the current partition spec.
This caused
inclusive_projectionto fail silently when evaluating older manifests —the predicate contained field references (e.g.
region) that have no correspondingpartition field in the old spec, causing the manifest evaluator to skip those manifests
entirely. The result is silent data duplication: stale rows from old spec manifests are
never deleted.
Changes
pyiceberg/table/__init__.py:dynamic_partition_overwritenow iterates over allpartition_spec_ids present in the current snapshot and builds a per-spec deletepredicate, projecting the new data files' partition values into each historical spec's
coordinate space before evaluating.
tests/integration/test_manifest_pruning_spec_evolution.py: two regression tests added:duplication case — no exception raised, wrong rows survive)
Are these changes tested?
Yes — two new integration tests using the SQLite in-memory catalog, no external
services required.
Are there any user-facing changes?
Yes —
dynamic_partition_overwritenow correctly deletes all matching rows acrossall historical partition specs, fixing silent data duplication on evolved tables.
Related