Reject unsupported identity transform types#3517
Conversation
|
Good one @zhjwpku, I think we missed updating this when adding the V3 types. Should we disallow VariantType as well? |
Hi @Fokko , the spec did mentioned VariantType, but since VariantType is not a primitive type, I think there is no different to add it or not. I've started a thread[1] to discuss whether we should polish the spec, you may want to take a look :-) [1] https://lists.apache.org/thread/bv9461xjhph24s424g7mtczojq60xr7y |
|
|
||
| def can_transform(self, source: IcebergType) -> bool: | ||
| return source.is_primitive | ||
| return source.is_primitive and not isinstance(source, (GeographyType, GeometryType)) |
There was a problem hiding this comment.
I completely understand that VariantType isn't a primitive, but I think we should add it here:
- It doesn't hurt anything, since it won't be called
- It stops a couple LLM-generated PRs come in that "notice" that VariantType is missing here
- It makes this method explicitly match the spec. The code should match the spec. Divergences are where people begin to not trust the spec.
There was a problem hiding this comment.
Yeah, I agree we should follow the spec before a decision is made to change it. However, after searching the codebase, it seems that VariantType is not currently supported in Iceberg Python, and supporting VariantType feels like a much broader effort that should probably be addressed in a separate PR, WDYT?
There was a problem hiding this comment.
Maybe just add a TODO and let's move on?
rambleraptor
left a comment
There was a problem hiding this comment.
I've got my one nit, but broadly this looks great. Thanks for doing this!
| def test_identity_transform_unsupported_type(type_var: PrimitiveType) -> None: | ||
| assert not IdentityTransform().can_transform(type_var) | ||
|
|
||
|
|
There was a problem hiding this comment.
We could improve the test coverage for other methods such as transform.
If we want to test can_transform only, I suggest renaming to test_identity_can_transform_unsupported_type.
There was a problem hiding this comment.
Changed as suggested, thanks
Rationale for this change
Per spec[1], the identity transform source type must be any primitive except for geometry, geography, and variant. This PR rejects GeographyType and GeometryType as source types for identity transforms.(No VariantType yet)
[1] https://iceberg.apache.org/spec/#partition-transforms
Are these changes tested?
Yes
Are there any user-facing changes?
No