It panics in a test where we do a complex change-data-capture query. I'll try to find a minimal repro case, but here's the full optimized plan for now:
CopyTo: format=parquet output_url=/tmp/.tmpaPY3Jq/data.parquet options: (single_file_output true)
Sort: offset ASC NULLS FIRST
Projection: CAST(CAST(row_number() PARTITION BY [Int32(1)] ORDER BY [city ASC NULLS FIRST, op ASC NULLS FIRST] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW AS Decimal128(20, 0)) + Decimal128(Some(2),20,0) AS Int64) AS offset, op, system_time, event_time, city, population, census_url
WindowAggr: windowExpr=[[row_number() PARTITION BY [Int32(1)] ORDER BY [city ASC NULLS FIRST, op ASC NULLS FIRST] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW]]
Projection: op, assert_not_null(city) AS city, assert_not_null(population) AS population, census_url, CASE WHEN event_time IS NOT NULL THEN event_time ELSE TimestampMillisecond(946728000000, Some("UTC")) END AS event_time, TimestampMillisecond(1262347200000, Some("UTC")) AS system_time
Union
Projection: CASE WHEN old.city IS NULL THEN Int32(0) WHEN __common_expr_1 THEN Int32(1) ELSE Int32(3) END AS op, CASE WHEN __common_expr_1 THEN old.city ELSE new.city END AS city, CASE WHEN __common_expr_1 THEN old.population ELSE new.population END AS population, CASE WHEN __common_expr_1 THEN old.census_url ELSE new.census_url END AS census_url, CASE WHEN __common_expr_1 THEN old.event_time ELSE new.event_time END AS event_time
Projection: new.city IS NULL AS __common_expr_1, old.event_time, old.city, old.population, old.census_url, new.city, new.population, new.census_url, new.event_time
Filter: old.population IS DISTINCT FROM new.population OR old.census_url IS DISTINCT FROM new.census_url OR new.event_time IS NOT NULL AND old.event_time IS DISTINCT FROM new.event_time
Full Join: old.city = new.city
SubqueryAlias: old
Projection: ?table?.event_time, ?table?.city, ?table?.population, ?table?.census_url
Filter: __rank = UInt64(1) AND ?table?.op != Int32(1)
Projection: ?table?.op, ?table?.event_time, ?table?.city, ?table?.population, ?table?.census_url, __rank
WindowAggr: windowExpr=[[row_number() PARTITION BY [?table?.city] ORDER BY [?table?.offset DESC NULLS LAST] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW AS __rank]]
TableScan: ?table? projection=[offset, op, event_time, city, population, census_url]
SubqueryAlias: new
Projection: ?table?.city, ?table?.population, ?table?.census_url, TimestampMillisecond(NULL, Some("UTC")) AS event_time
TableScan: ?table? projection=[city, population, census_url]
Projection: Int32(2) AS op, old.city AS city, old.population AS population, old.census_url AS census_url, old.event_time AS event_time
Inner Join: old.city = new.city Filter: old.population IS DISTINCT FROM new.population OR old.census_url IS DISTINCT FROM new.census_url OR new.event_time IS NOT NULL AND old.event_time IS DISTINCT FROM new.event_time
SubqueryAlias: old
Projection: ?table?.event_time, ?table?.city, ?table?.population, ?table?.census_url
Filter: __rank = UInt64(1) AND ?table?.op != Int32(1)
Projection: ?table?.op, ?table?.event_time, ?table?.city, ?table?.population, ?table?.census_url, __rank
WindowAggr: windowExpr=[[row_number() PARTITION BY [?table?.city] ORDER BY [?table?.offset DESC NULLS LAST] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW AS __rank]]
TableScan: ?table? projection=[offset, op, event_time, city, population, census_url], partial_filters=[Boolean(true), Boolean(true)]
SubqueryAlias: new
Projection: ?table?.city, ?table?.population, ?table?.census_url, TimestampMillisecond(NULL, Some("UTC")) AS event_time
Filter: Boolean(true)
TableScan: ?table? projection=[city, population, census_url], partial_filters=[Boolean(true)]
thread panicked at /.../datafusion-physical-expr-54.0.0/src/projection.rs:664:60:
index out of bounds: the len is 6 but the index is 6
Describe the bug
Encountered a regression between
53.1.0and54.0.0where this line started to panic withindex out of boundserror.To Reproduce
It panics in a test where we do a complex change-data-capture query. I'll try to find a minimal repro case, but here's the full optimized plan for now:
Results in:
Expected behavior
No panics
Additional context
No response