Skip to content

ClickBench run.sh restarts process per try, losing hot-run caches #21696

@Dandandan

Description

@Dandandan

datafusion-partitioned/run.sh invokes a fresh datafusion-cli -f create.sql ... for each of the 3 tries, not just try 1. Since create.sql does CREATE EXTERNAL TABLE ... LOCATION 'partitioned' and the default is collect_statistics = true, every try re-scans all Parquet footers in a cold process.

Other engines keep one process across tries — e.g. duckdb-parquet-partitioned runs all 3 tries in one duckdb hits.db session with parquet_metadata_cache=true. Per the ClickBench rules, tries 2–3 are meant to be hot; our script makes them effectively cold.

Fix: drop OS cache once before try 1, then run all 3 tries in a single datafusion-cli session that has already executed create.sql.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions