-
Notifications
You must be signed in to change notification settings - Fork 249
Add pg_clickhouse #742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add pg_clickhouse #742
Conversation
It dupes the clickhouse benchmark config to load data (except that column names are lowercase) and the Postgres config to run the queries. All queries push down to ClickHouse without error.
|
Output in my test: |
| @@ -0,0 +1,43 @@ | |||
| SELECT COUNT(*) FROM hits; | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are the measurements with c6a.4xlarge instances: https://pastila.clickhouse.com/?00012d7f/6abbadde357ce424eac9b0db7b1aa307#sOCA4BXIIk/CnS1ic4Kt8w== (runtimes are at the end)
I couldn't spot any error messages in the log output. What makes me suspicious is that the corresponding "pure" ClickHouse measurements on c6a.4xlarge systems return worse runtimes: https://github.com/ClickHouse/ClickBench/blob/main/clickhouse/results/c6a.4xlarge.json (--> see e.g. the cold runs = the first values in each row).
Do you have an explanation how that may be?
Maybe we need to add more checks that the data was successfully imported and the queries return non-empty results?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure it exits with an error on load failure; I saw that a lot while I was working out how to make it work. Same for query failures; they mess with the resulting output.
I ran the test in Docker without greping-out query output and got this. Note that this is just one parquet file; I don't have capacity for 100.
For comparison, here's what I get in Docker for one parquet file and running the base ClickHouse benchmark . Looks fairly consistent at a glance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm working through the queries, comparing outputs. Found a that pg_clickhouse isn't pushing down MIN() and MAX(), and that the EventDate column is turning up empty. Going to work on that, then once fixed, go through the rest of the queries and fix any other issues I find.
It dupes the clickhouse benchmark config to load data (except that column names are lowercase) and the Postgres config to run the queries. All queries push down to ClickHouse without error.
Replaces #739 by eliminating a wayward
| head -n1and makes the column names in ClickHouse lowercase.