feat: Optimise convert_to_state for SUM and BIT_OR_XOR#21506
feat: Optimise convert_to_state for SUM and BIT_OR_XOR#21506Mark1626 wants to merge 1 commit intoapache:mainfrom
Conversation
|
run benchmarks |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing perf-convert-to-state (23fc979) to e1ad871 (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing perf-convert-to-state (23fc979) to e1ad871 (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing perf-convert-to-state (23fc979) to e1ad871 (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
|
Not sure why Clickbench Q19 is degrading, it doesn't have an agg |
|
run benchmarks |
Some of them are slower sometimes (usually one of the runs). If you look at the minimum time, they are similar. |
|
run benchmark tpch tpch10 |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing perf-convert-to-state (23fc979) to e1ad871 (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing perf-convert-to-state (23fc979) to e1ad871 (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing perf-convert-to-state (23fc979) to e1ad871 (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing perf-convert-to-state (23fc979) to e1ad871 (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing perf-convert-to-state (23fc979) to e1ad871 (merge-base) diff using: tpch10 File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpch10 — base (merge-base)
tpch10 — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
Which issue does this PR close?
Rationale for this change
I have a query where GroupedHashAggregateStream was switching to SkippingAggregation. I noticed
convert_to_statecoming up as a bottleneck, particular the memory allocation and the arrow kernelWhat changes are included in this PR?
Add a fast path for sum and bit OR/XOR, where convert_to_state returns after applying the filter
Are these changes tested?
Yes
Are there any user-facing changes?
No