test: [DO NOT MERGE] test upstream iceberg-rust fix for #3856 by mbutrovich · Pull Request #3857 · apache/datafusion-comet

mbutrovich · 2026-03-31T17:07:53Z

Which issue does this PR close?

Closes #3856.

Rationale for this change

Test apache/iceberg-rust#2301 with the repro test in #3856.

What changes are included in this PR?

Point iceberg-rust dependency at a branch on my fork that includes the fix from fix: incorrect Parquet INT96 timestamp values from ArrowReader iceberg-rust#2301. I can't point at the exact PR branch because Comet still needs to be based on 0.9.0 since we haven't completed the upgrade to DF53. That branch just cherry-picks the fix into the 0.9.0-based branch.

How are these changes tested?

New test.

# Conflicts: # native/Cargo.lock

## Which issue does this PR close? - Closes #2299. ## What changes are included in this PR? - Add `coerce_int96_timestamps()` to patch the Arrow schema before reading, using arrow-rs's schema hint mechanism (`ArrowReaderOptions::with_schema`) to read INT96 columns at the resolution specified by the Iceberg table schema - `timestamp`/`timestamptz` → microsecond, `timestamp_ns`/`timestamptz_ns` → nanosecond, per the [Iceberg spec](https://iceberg.apache.org/spec/#primitive-types) - Falls back to microsecond when no field ID is available (matching Iceberg Java's `TimestampInt96Reader` behavior) - Applied after all three schema resolution branches (with field IDs, name mapping, positional fallback) so the fix covers both native and migrated tables - Handles INT96 inside nested types (structs, lists, maps) via `ArrowSchemaVisitor` traversal - Visitor and tests live in a standalone `arrow/int96.rs` module to keep `reader.rs` manageable - Made `visit_schema` in `arrow/schema.rs` `pub(crate)` so the coercion visitor can reuse the existing traversal ## Are these changes tested? - `test_read_int96_timestamps_with_field_ids` — files with embedded field IDs (branch 1) - `test_read_int96_timestamps_without_field_ids` — migrated files without field IDs (branches 2/3) - `test_read_int96_timestamps_in_struct` — INT96 inside a struct field - `test_read_int96_timestamps_in_list` — INT96 inside a list field (3-level Parquet LIST encoding) - `test_read_int96_timestamps_in_map` — INT96 as map values - All tests use dates outside the i64 nanosecond range (~1677-2262) to confirm the overflow is avoided - [Apache DataFusion Comet](https://github.com/apache/datafusion-comet) used the repro test in [apache/datafusion-comet#3856](apache/datafusion-comet#3856) and it passes with this change: apache/datafusion-comet#3857

add test for int96 migrated table to CometIcebergNativeSuite.

a018d70

mbutrovich changed the title ~~test: [DO NOT MERGE] test for apache/iceberg-rust/2301~~ test: [DO NOT MERGE] test fix for #3856 Mar 31, 2026

mbutrovich changed the title ~~test: [DO NOT MERGE] test fix for #3856~~ test: [DO NOT MERGE] test upstream iceberg-rust fix for #3856 Mar 31, 2026

mbutrovich mentioned this pull request Mar 31, 2026

fix: incorrect Parquet INT96 timestamp values from ArrowReader apache/iceberg-rust#2301

Merged

mbutrovich added 6 commits March 31, 2026 13:57

fix spotless

f4cf07f

bump commit after PR feedback

918808c

Fix test.

7aaf881

Merge branch 'main' into int96_test

7d5b8aa

Bump to new commit based on PR feedback.

460293c

more robust test

3d1d08b

mbutrovich mentioned this pull request Mar 31, 2026

bug: native Iceberg reader errors on residual filter on column after nested type for migrated Parquet files #3860

Open

make format.

081404d

mbutrovich mentioned this pull request Mar 31, 2026

fix: build_fallback_field_id_map produces incorrect column indices for schemas with nested types apache/iceberg-rust#2307

Merged

mbutrovich added 4 commits April 1, 2026 11:18

Bump to new commit based on PR feedback.

b93c607

bump to latest commit to reflect PR feedback upstream.

2e4bb24

Merge branch 'main' into int96_test

94d2c6c

Merge branch 'main' into int96_test

ed48e5e

# Conflicts: # native/Cargo.lock

mbutrovich closed this Apr 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: [DO NOT MERGE] test upstream iceberg-rust fix for #3856#3857

test: [DO NOT MERGE] test upstream iceberg-rust fix for #3856#3857
mbutrovich wants to merge 12 commits intoapache:mainfrom
mbutrovich:int96_test

mbutrovich commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mbutrovich commented Mar 31, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant