Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion pyiceberg/transforms.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,8 @@
DateType,
DecimalType,
FixedType,
GeographyType,
GeometryType,
IcebergType,
IntegerType,
LongType,
Expand Down Expand Up @@ -717,7 +719,8 @@ def transform(self, source: IcebergType) -> Callable[[S | None], S | None]:
return lambda v: v

def can_transform(self, source: IcebergType) -> bool:
return source.is_primitive
# TODO: disallow VariantType when PyIceberg supports it.
return source.is_primitive and not isinstance(source, (GeographyType, GeometryType))

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely understand that VariantType isn't a primitive, but I think we should add it here:

  • It doesn't hurt anything, since it won't be called
  • It stops a couple LLM-generated PRs come in that "notice" that VariantType is missing here
  • It makes this method explicitly match the spec. The code should match the spec. Divergences are where people begin to not trust the spec.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree we should follow the spec before a decision is made to change it. However, after searching the codebase, it seems that VariantType is not currently supported in Iceberg Python, and supporting VariantType feels like a much broader effort that should probably be addressed in a separate PR, WDYT?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just add a TODO and let's move on?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, thanks.


def result_type(self, source: IcebergType) -> IcebergType:
return source
Expand Down
7 changes: 7 additions & 0 deletions tests/test_transforms.py
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,8 @@
DoubleType,
FixedType,
FloatType,
GeographyType,
GeometryType,
IntegerType,
LongType,
NestedField,
Expand Down Expand Up @@ -255,6 +257,11 @@ def test_identity_transform_unknown_type() -> None:
assert IdentityTransform().to_human_string(UnknownType(), None) == "null"


@pytest.mark.parametrize("type_var", [GeometryType(), GeographyType()])
def test_identity_can_transform_unsupported_type(type_var: PrimitiveType) -> None:
assert not IdentityTransform().can_transform(type_var)


Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could improve the test coverage for other methods such as transform.

If we want to test can_transform only, I suggest renaming to test_identity_can_transform_unsupported_type.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed as suggested, thanks

def test_string_with_surrogate_pair() -> None:
string_with_surrogate_pair = "string with a surrogate pair: 💰"
as_bytes = bytes(string_with_surrogate_pair, UTF8)
Expand Down