datafusion-partitioned/run.sh invokes a fresh datafusion-cli -f create.sql ... for each of the 3 tries, not just try 1. Since create.sql does CREATE EXTERNAL TABLE ... LOCATION 'partitioned' and the default is collect_statistics = true, every try re-scans all Parquet footers in a cold process.
Other engines keep one process across tries — e.g. duckdb-parquet-partitioned runs all 3 tries in one duckdb hits.db session with parquet_metadata_cache=true. Per the ClickBench rules, tries 2–3 are meant to be hot; our script makes them effectively cold.
Fix: drop OS cache once before try 1, then run all 3 tries in a single datafusion-cli session that has already executed create.sql.
datafusion-partitioned/run.shinvokes a freshdatafusion-cli -f create.sql ...for each of the 3 tries, not just try 1. Sincecreate.sqldoesCREATE EXTERNAL TABLE ... LOCATION 'partitioned'and the default iscollect_statistics = true, every try re-scans all Parquet footers in a cold process.Other engines keep one process across tries — e.g.
duckdb-parquet-partitionedruns all 3 tries in oneduckdb hits.dbsession withparquet_metadata_cache=true. Per the ClickBench rules, tries 2–3 are meant to be hot; our script makes them effectively cold.Fix: drop OS cache once before try 1, then run all 3 tries in a single
datafusion-clisession that has already executedcreate.sql.