Skip to content

test: [DO NOT MERGE] test upstream iceberg-rust fix for #3856#3857

Closed
mbutrovich wants to merge 12 commits intoapache:mainfrom
mbutrovich:int96_test
Closed

test: [DO NOT MERGE] test upstream iceberg-rust fix for #3856#3857
mbutrovich wants to merge 12 commits intoapache:mainfrom
mbutrovich:int96_test

Conversation

@mbutrovich
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #3856.

Rationale for this change

Test apache/iceberg-rust#2301 with the repro test in #3856.

What changes are included in this PR?

How are these changes tested?

  • New test.

@mbutrovich mbutrovich changed the title test: [DO NOT MERGE] test for apache/iceberg-rust/2301 test: [DO NOT MERGE] test fix for #3856 Mar 31, 2026
@mbutrovich mbutrovich changed the title test: [DO NOT MERGE] test fix for #3856 test: [DO NOT MERGE] test upstream iceberg-rust fix for #3856 Mar 31, 2026
@mbutrovich mbutrovich closed this Apr 3, 2026
blackmwk pushed a commit to apache/iceberg-rust that referenced this pull request Apr 16, 2026
## Which issue does this PR close?

- Closes #2299.
## What changes are included in this PR?
- Add `coerce_int96_timestamps()` to patch the Arrow schema before
reading, using arrow-rs's schema hint mechanism
(`ArrowReaderOptions::with_schema`) to read
INT96 columns at the resolution specified by the Iceberg table schema
- `timestamp`/`timestamptz` → microsecond,
`timestamp_ns`/`timestamptz_ns` → nanosecond, per the [Iceberg
spec](https://iceberg.apache.org/spec/#primitive-types)
- Falls back to microsecond when no field ID is available (matching
Iceberg Java's `TimestampInt96Reader` behavior)
- Applied after all three schema resolution branches (with field IDs,
name mapping, positional fallback) so the fix covers both native and
migrated tables
- Handles INT96 inside nested types (structs, lists, maps) via
`ArrowSchemaVisitor` traversal
- Visitor and tests live in a standalone `arrow/int96.rs` module to keep
`reader.rs` manageable
- Made `visit_schema` in `arrow/schema.rs` `pub(crate)` so the coercion
visitor can reuse the existing traversal

## Are these changes tested?

- `test_read_int96_timestamps_with_field_ids` — files with embedded
field IDs (branch 1)
- `test_read_int96_timestamps_without_field_ids` — migrated files
without field IDs (branches 2/3)
- `test_read_int96_timestamps_in_struct` — INT96 inside a struct field
- `test_read_int96_timestamps_in_list` — INT96 inside a list field
(3-level Parquet LIST encoding)
- `test_read_int96_timestamps_in_map` — INT96 as map values
- All tests use dates outside the i64 nanosecond range (~1677-2262) to
confirm the overflow is avoided
- [Apache DataFusion Comet](https://github.com/apache/datafusion-comet)
used the repro test in

[apache/datafusion-comet#3856](apache/datafusion-comet#3856)
and it passes with this change:
apache/datafusion-comet#3857
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: native Iceberg reader can return wrong results for migrated Parquet files with INT96 timestamps

1 participant