fix: fix replace_table_data broken on multiple adapters (Trino, Redshift, etc.)#969
fix: fix replace_table_data broken on multiple adapters (Trino, Redshift, etc.)#969
Conversation
…lickHouse, Vertica, Fabric/SQL Server)
- Trino: use edr_create_table_as with drop_first=true instead of adapter.dispatch('create_table_as') with unsupported replace kwarg
- Postgres: split multi-statement transaction into separate run_query calls for post_hook compatibility
- Redshift: add dedicated dispatch with separate statements (no transaction wrapping) for post_hook compatibility
- ClickHouse: add dedicated truncate-and-insert dispatch (was falling through to default CTAS which fails without CREATE OR REPLACE support)
- Vertica: add dedicated truncate-and-insert dispatch
- Fabric/SQL Server: add fabric__ dispatch (covers sqlserver via dispatch chain) using truncate-and-insert
Fixes ELE-5298
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
|
👋 @devin-ai-integration[bot] |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughReplace single-transaction replace_table_data with dialect-specific, per-statement replace macros for Postgres, Redshift, Trino, ClickHouse, Vertica, and Fabric; add an integration test and a test macro to insert a sentinel row validating full-table replacement. Changes
Sequence Diagram(s)(omitted) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
- ClickHouse: use explicit TRUNCATE TABLE with on_cluster_clause for distributed/replicated table support - Redshift: add data-loss warning note to comment Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
|
Devin - the ticket explicitly requires to add an integration test. |
Validates that replace_table_data works correctly by setting cache_artifacts=False to force the replace path, running a dbt model, and asserting the artifact table contains the expected data. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
| ) | ||
|
|
||
| # The table must still contain exactly one row for the test model. | ||
| assert len(replaced) == 1, ( |
There was a problem hiding this comment.
wouldn't that condition also hold for the "diff" method?
should we verify an actual replace happened?
There was a problem hiding this comment.
Good point — updated in 28a3fd1. The test now changes the model's owner between two replace runs and asserts the new owner is present, proving an actual replace happened (same pattern as test_artifacts_updating). Validated locally against Postgres.
Change owner between runs and assert the new owner is present, proving the table was actually replaced (not just appended/unchanged). Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@integration_tests/tests/test_dbt_artifacts/test_artifacts.py`:
- Around line 60-88: Add the Dremio skip decorator to the
test_replace_table_data function to match other dbt artifacts tests: annotate
the test_replace_table_data function with `@pytest.mark.skip_targets`(["dremio"])
(ensure pytest is imported where tests live) so the test is skipped for Dremio
similarly to test_artifacts_updating.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c80d474f-d416-4549-8cb0-c8b4dd219062
📒 Files selected for processing (1)
integration_tests/tests/test_dbt_artifacts/test_artifacts.py
Inserts a dummy row into dbt_models before the replace run, then asserts it was removed — proving full table replacement (not diff). Adds insert_sentinel_row macro for cross-adapter INSERT via run_operation. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@integration_tests/dbt_project/macros/insert_sentinel_row.sql`:
- Around line 7-10: The SQL literal currently injects sentinel_alias directly
into the run_query call in insert_sentinel_row.sql, which breaks if the alias
contains single quotes; fix by escaping sentinel_alias before embedding (e.g.,
create a local variable like escaped_alias = sentinel_alias | replace("'", "''")
or use a dbt escape helper) and use escaped_alias in the VALUES string passed to
run_query (reference the run_query call and the sentinel_alias variable in this
macro).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 398c1007-b494-4c93-ab35-ab0023dd7004
📒 Files selected for processing (2)
integration_tests/dbt_project/macros/insert_sentinel_row.sqlintegration_tests/tests/test_dbt_artifacts/test_artifacts.py
| {% do run_query( | ||
| "INSERT INTO " ~ relation ~ " (unique_id, alias, name)" | ||
| " VALUES ('test.sentinel', '" ~ sentinel_alias ~ "', 'sentinel')" | ||
| ) %} |
There was a problem hiding this comment.
Escape sentinel_alias before embedding it in the SQL literal.
run_operation passes raw strings into the macro, so any alias containing ' will produce invalid SQL here and make this helper flaky for future test cases.
Proposed fix
{% macro insert_sentinel_row(table_name, sentinel_alias) %}
{`#-` Insert a sentinel row into the given Elementary table.
Used by integration tests to verify that replace_table_data
actually truncates the whole table (sentinel disappears) rather
than doing a diff-based update (sentinel would survive). -#}
{% set relation = ref(table_name) %}
+ {% set escaped_sentinel_alias = sentinel_alias | replace("'", "''") %}
{% do run_query(
"INSERT INTO " ~ relation ~ " (unique_id, alias, name)"
- " VALUES ('test.sentinel', '" ~ sentinel_alias ~ "', 'sentinel')"
+ " VALUES ('test.sentinel', '" ~ escaped_sentinel_alias ~ "', 'sentinel')"
) %}
{% endmacro %}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| {% do run_query( | |
| "INSERT INTO " ~ relation ~ " (unique_id, alias, name)" | |
| " VALUES ('test.sentinel', '" ~ sentinel_alias ~ "', 'sentinel')" | |
| ) %} | |
| {% macro insert_sentinel_row(table_name, sentinel_alias) %} | |
| {`#-` Insert a sentinel row into the given Elementary table. | |
| Used by integration tests to verify that replace_table_data | |
| actually truncates the whole table (sentinel disappears) rather | |
| than doing a diff-based update (sentinel would survive). -#} | |
| {% set relation = ref(table_name) %} | |
| {% set escaped_sentinel_alias = sentinel_alias | replace("'", "''") %} | |
| {% do run_query( | |
| "INSERT INTO " ~ relation ~ " (unique_id, alias, name)" | |
| " VALUES ('test.sentinel', '" ~ escaped_sentinel_alias ~ "', 'sentinel')" | |
| ) %} | |
| {% endmacro %} |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@integration_tests/dbt_project/macros/insert_sentinel_row.sql` around lines 7
- 10, The SQL literal currently injects sentinel_alias directly into the
run_query call in insert_sentinel_row.sql, which breaks if the alias contains
single quotes; fix by escaping sentinel_alias before embedding (e.g., create a
local variable like escaped_alias = sentinel_alias | replace("'", "''") or use a
dbt escape helper) and use escaped_alias in the VALUES string passed to
run_query (reference the run_query call and the sentinel_alias variable in this
macro).
Vertica does not auto-commit DML; the INSERT was silently rolled back when the run_operation connection closed. Adding an explicit commit fixes the test on Vertica (and is harmless on other adapters). Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
Vertica's dbt adapter raises 'no transaction in progress' from adapter.commit() within run_operation context. A raw SQL COMMIT bypasses dbt's internal transaction tracking and is a harmless no-op on databases that auto-commit. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
Most adapters auto-commit run_query DML within run_operation. Vertica is the exception — it needs an explicit COMMIT. Spark (and similar engines) reject bare COMMIT statements. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
…s all columns) Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
fix: fix replace_table_data broken on multiple adapters
Summary
replace_table_datais broken on multiple adapters when called from a post_hook context (triggered whencache_artifacts=False). This is currently masked becausecache_artifactsdefaults totrue, so the diff-based upload path is used andreplace_table_datais never called in normal operation. Discovered via PR #968.Fixes per adapter:
adapter.dispatch("create_table_as")called with unsupportedreplace=truekwargelementary.edr_create_table_as(..., drop_first=true)begin; delete; insert; commit;) fails in post_hook prepared statementsrun_querycalls, preserving transaction wrappingpostgres__dispatch with same multi-statement issuedefault__which uses CTAS withoutCREATE OR REPLACEon_cluster_clause(relation)default__(same CTAS issue)default__(same CTAS issue)fabric__dispatch (coverssqlservervia dispatch chain)Integration test:
test_replace_table_datauses a sentinel row pattern to prove full table replacement. It inserts an unrelated dummy row intodbt_models, then runs withcache_artifacts=Falseto triggerreplace_table_data, and asserts the sentinel row was removed — proving a full replace happened rather than a diff-based update (which would leave unrelated rows intact).Fixes ELE-5298.
Updates since last revision
dbt.truncate_relation()with explicitTRUNCATE TABLE ... ON CLUSTERviarun_query+on_cluster_clause(relation), matching the pattern indelete_and_insert.sqlandclean_elementary_test_tables.sql.insert_sentinel_row.sql) dynamically reads all table columns viaadapter.get_columns_in_relation()to build a full-column INSERT (required by Spark/Delta Lake which rejects partial column lists). An explicitCOMMITis issued on most SQL adapters, with a blacklist for engines that don't support bare COMMIT (spark,databricks,bigquery,athena).Review & Testing Checklist for Human
begin/commitas separaterun_querycalls: Verify that splitting the transaction into individualrun_querycalls actually works within a post_hook context — this relies on dbt using the same connection across calls so the transaction persists. If dbt opens new connections perrun_query, atomicity is silently lost.redshift__dispatch deliberately omits transaction wrapping (matching the existingredshift__get_delete_and_insert_queriespattern). If theinsertfails after thedelete, data is lost until the next run. Confirm this tradeoff is acceptable.on_cluster_clause: Now uses explicitTRUNCATE TABLE ... ON CLUSTERviarun_query, matching the pattern in other ClickHouse-specific macros. Not tested against an actual ClickHouse cluster — verify this works for both single-node and clustered deployments.insert_sentinel_rowmacro issues a rawrun_query("COMMIT")on all adapters exceptspark,databricks,bigquery, andathena. If a future adapter also rejects bare COMMIT, the sentinel test will fail on that adapter (the fix itself is unaffected — only the test helper needs updating).sqlserver__ → fabric__ → default__dispatch order (based on comment increate_temp_table.sql). Confirm this is correct fordbt-fabric/dbt-sqlserver.Notes
insert_sentinel_row.sql) lives underintegration_tests/dbt_project/macros/and is only used by the test suite — it does not ship with the package.Summary by CodeRabbit
Refactor
New Features
Tests