Fix deepcopy for And, Or, and Not expressions#79
Conversation
Pydantic v2's BaseModel.__deepcopy__ calls cls.__new__(cls) with no args, but And, Or, and Not define __new__ with required positional arguments. These tests reproduce the TypeError on deepcopy.
Add __deepcopy__ to BooleanExpression that reconstructs the expression via model_fields, and returns self for Singleton subclasses. Fixes TypeError from Pydantic v2's BaseModel.__deepcopy__ calling cls.__new__(cls) with no args on And, Or, and Not which require positional arguments in __new__.
There was a problem hiding this comment.
Pull request overview
This PR addresses failures when copy.deepcopy() is applied to PyIceberg boolean expression trees under Pydantic v2, particularly for compound expressions (And/Or/Not) with __new__ signatures requiring arguments, and for Singleton-based expressions that must not be reconstructed.
Changes:
- Add a
BooleanExpression.__deepcopy__implementation that returnsselfforSingletonsubclasses and reconstructs other expressions by deep-copying model fields. - Add unit tests covering deepcopy across several expression types, nested/balanced trees, and deepcopy→pickle roundtrips.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
pyiceberg/expressions/__init__.py |
Introduces custom __deepcopy__ for all BooleanExpression subclasses to avoid Pydantic v2 __new__-without-args failures and handle Singleton immutability. |
tests/expressions/test_expressions.py |
Adds targeted tests validating deepcopy behavior for multiple expression shapes and pickle compatibility. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Instead of a generic __deepcopy__ on BooleanExpression with Pydantic internals, add simple __deepcopy__ methods to the three classes that need them. Also add identity assertions for non-Singleton copies.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
ShreyeshArangath
left a comment
There was a problem hiding this comment.
LGTM, but are we also planning to add support for deep copy in Bound* and In expressions later?
Those already work because they don't have custom |
## Summary Update li-pyiceberg dependency version to 0.11.5 for bug fix linkedin/iceberg-python#79 ## Changes - [ ] Client-facing API Changes - [ ] Internal API Changes - [x] Bug Fixes - [ ] New Features - [ ] Performance Improvements - [ ] Code Style - [ ] Refactoring - [ ] Documentation - [ ] Tests ## Testing Done <!--- Check any relevant boxes with "x" --> - [ ] Manually Tested on local docker setup. Please include commands ran, and their output. - [ ] Added new tests for the changes made. - [ ] Updated existing tests to reflect the changes made. - [x] No tests added or updated. Please explain why. If unsure, please feel free to ask for help. - [ ] Some other form of testing like staging or soak time in production. Please explain. Dependency bump. Ran existing tests. # Additional Information - [ ] Breaking Changes - [ ] Deprecations - [ ] Large PR broken into smaller PRs, and PR plan linked in the description. For all the boxes checked, include additional details of the changes made in this pull request.
Rationale for this change
copy.deepcopy()onAnd,Or, andNotexpressions raisesTypeErrorbecause Pydantic v2's default__deepcopy__callscls.__new__(cls)with no args, but these classes require positional arguments in__new__.The fix adds
__deepcopy__toAnd,Or, andNotthat deepcopy their fields and call the constructor directly.Apache change to minimize the divergence here apache/iceberg-python#3295
Are these changes tested?
Yes. 11 new unit tests covering all expression types, balanced trees, nested expressions, and deepcopy followed by pickle.
Are there any user-facing changes?
Yes,
copy.deepcopy()on expressions now works instead of raising.