Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions benchmarks/queries/clickbench/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -228,6 +228,22 @@ Results look like
Elapsed 30.195 seconds.
```


### Q9-Q12: FIRST_VALUE Aggregation Performance

These queries test the performance of the `FIRST_VALUE` aggregation function with different data types and grouping cardinalities.

| Query | `FIRST_VALUE` Column | Column Type | Group By Column | Group By Type | Number of Groups |
|-------|----------------------|-------------|-----------------|---------------|------------------|
| Q9 | `URL` | `Utf8` | `UserID` | `Int64` | 17,630,976 |
| Q10 | `URL` | `Utf8` | `OS` | `Int16` | 91 |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This table formatting looks a bit off

| Q11 | `WatchID` | `Int64` | `UserID` | `Int64` | 17,630,976 |
| Q12 | `WatchID` | `Int64` | `OS` | `Int16` | 91 |





## Data Notes

Here are some interesting statistics about the data used in the queries
Expand Down
8 changes: 8 additions & 0 deletions benchmarks/queries/clickbench/extended/q10.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
-- Must set for ClickBench hits_partitioned dataset. See https://github.com/apache/datafusion/issues/16591
-- set datafusion.execution.parquet.binary_as_string = true

SELECT MAX(len) FROM (
SELECT LENGTH(FIRST_VALUE("URL" ORDER BY "EventTime")) as len
FROM hits
GROUP BY "OS"
);
8 changes: 8 additions & 0 deletions benchmarks/queries/clickbench/extended/q11.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
-- Must set for ClickBench hits_partitioned dataset. See https://github.com/apache/datafusion/issues/16591
-- set datafusion.execution.parquet.binary_as_string = true

SELECT MAX(fv) FROM (
SELECT FIRST_VALUE("WatchID" ORDER BY "EventTime") as fv
FROM hits
GROUP BY "UserID"
);
8 changes: 8 additions & 0 deletions benchmarks/queries/clickbench/extended/q12.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
-- Must set for ClickBench hits_partitioned dataset. See https://github.com/apache/datafusion/issues/16591
-- set datafusion.execution.parquet.binary_as_string = true

SELECT MAX(fv) FROM (
SELECT FIRST_VALUE("WatchID" ORDER BY "EventTime") as fv
FROM hits
GROUP BY "OS"
);
4 changes: 4 additions & 0 deletions benchmarks/queries/clickbench/extended/q8.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
-- Must set for ClickBench hits_partitioned dataset. See https://github.com/apache/datafusion/issues/16591
-- set datafusion.execution.parquet.binary_as_string = true

SELECT "RegionID", "UserAgent", "OS", AVG(to_timestamp("ResponseEndTiming")-to_timestamp("ResponseStartTiming")) as avg_response_time, AVG(to_timestamp("ResponseEndTiming")-to_timestamp("ConnectTiming")) as avg_latency FROM hits GROUP BY "RegionID", "UserAgent", "OS" ORDER BY avg_latency DESC limit 10;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This query was missed in this PR.

8 changes: 8 additions & 0 deletions benchmarks/queries/clickbench/extended/q9.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
-- Must set for ClickBench hits_partitioned dataset. See https://github.com/apache/datafusion/issues/16591
-- set datafusion.execution.parquet.binary_as_string = true

SELECT MAX(len) FROM (
SELECT LENGTH(FIRST_VALUE("URL" ORDER BY "EventTime")) as len
FROM hits
GROUP BY "UserID"
);
Loading
Loading