Skip to content

fix: fix replace_table_data broken on multiple adapters (Trino, Redshift, etc.)#969

Merged
haritamar merged 10 commits intomasterfrom
devin/ELE-5298-1773337700
Mar 13, 2026
Merged

fix: fix replace_table_data broken on multiple adapters (Trino, Redshift, etc.)#969
haritamar merged 10 commits intomasterfrom
devin/ELE-5298-1773337700

Conversation

@devin-ai-integration
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot commented Mar 12, 2026

fix: fix replace_table_data broken on multiple adapters

Summary

replace_table_data is broken on multiple adapters when called from a post_hook context (triggered when cache_artifacts=False). This is currently masked because cache_artifacts defaults to true, so the diff-based upload path is used and replace_table_data is never called in normal operation. Discovered via PR #968.

Fixes per adapter:

Adapter Root Cause Fix
Trino adapter.dispatch("create_table_as") called with unsupported replace=true kwarg Use elementary.edr_create_table_as(..., drop_first=true)
Postgres Multi-statement query (begin; delete; insert; commit;) fails in post_hook prepared statements Split into separate run_query calls, preserving transaction wrapping
Redshift Inherited broken postgres__ dispatch with same multi-statement issue New dedicated dispatch with separate statements, no transaction wrapping
ClickHouse Fell through to default__ which uses CTAS without CREATE OR REPLACE New cluster-aware truncate-and-insert dispatch using on_cluster_clause(relation)
Vertica Fell through to default__ (same CTAS issue) New truncate-and-insert dispatch
Fabric / SQL Server Fell through to default__ (same CTAS issue) New fabric__ dispatch (covers sqlserver via dispatch chain)

Integration test: test_replace_table_data uses a sentinel row pattern to prove full table replacement. It inserts an unrelated dummy row into dbt_models, then runs with cache_artifacts=False to trigger replace_table_data, and asserts the sentinel row was removed — proving a full replace happened rather than a diff-based update (which would leave unrelated rows intact).

Fixes ELE-5298.

Updates since last revision

  • ClickHouse cluster awareness (CodeRabbit finding): Replaced dbt.truncate_relation() with explicit TRUNCATE TABLE ... ON CLUSTER via run_query + on_cluster_clause(relation), matching the pattern in delete_and_insert.sql and clean_elementary_test_tables.sql.
  • Redshift comment: Added data-loss warning note explaining the non-atomic tradeoff.
  • Integration test — sentinel row pattern: Replaced the initial basic smoke test with a sentinel-based assertion that truly distinguishes replace from diff mode. A helper macro (insert_sentinel_row.sql) dynamically reads all table columns via adapter.get_columns_in_relation() to build a full-column INSERT (required by Spark/Delta Lake which rejects partial column lists). An explicit COMMIT is issued on most SQL adapters, with a blacklist for engines that don't support bare COMMIT (spark, databricks, bigquery, athena).
  • CI green across all 23 checks: Postgres, Redshift, Snowflake, BigQuery, Databricks, ClickHouse, Trino, Vertica, Fabric, SQL Server, DuckDB, Dremio, Spark, Athena (including fusion variants).

Review & Testing Checklist for Human

  • Postgres begin/commit as separate run_query calls: Verify that splitting the transaction into individual run_query calls actually works within a post_hook context — this relies on dbt using the same connection across calls so the transaction persists. If dbt opens new connections per run_query, atomicity is silently lost.
  • Redshift non-atomicity: The new redshift__ dispatch deliberately omits transaction wrapping (matching the existing redshift__get_delete_and_insert_queries pattern). If the insert fails after the delete, data is lost until the next run. Confirm this tradeoff is acceptable.
  • ClickHouse on_cluster_clause: Now uses explicit TRUNCATE TABLE ... ON CLUSTER via run_query, matching the pattern in other ClickHouse-specific macros. Not tested against an actual ClickHouse cluster — verify this works for both single-node and clustered deployments.
  • Sentinel test COMMIT blacklist: The insert_sentinel_row macro issues a raw run_query("COMMIT") on all adapters except spark, databricks, bigquery, and athena. If a future adapter also rejects bare COMMIT, the sentinel test will fail on that adapter (the fix itself is unaffected — only the test helper needs updating).
  • Fabric/SQL Server dispatch chain: The fix assumes sqlserver__ → fabric__ → default__ dispatch order (based on comment in create_temp_table.sql). Confirm this is correct for dbt-fabric / dbt-sqlserver.

Notes

  • The existing adapters that were already working (BigQuery, Snowflake, Databricks, Spark, FabricSpark, Dremio, Athena, DuckDB) are untouched.
  • The sentinel test helper (insert_sentinel_row.sql) lives under integration_tests/dbt_project/macros/ and is only used by the test suite — it does not ship with the package.
  • Link to Devin session
  • Requested by: @haritamar

Summary by CodeRabbit

  • Refactor

    • Switched table-replace to per-statement operations to preserve views and align transaction semantics across engines; removed single-transaction replace for some engines.
  • New Features

    • Added dialect-specific replace flows covering Postgres, Redshift, Trino, ClickHouse, Vertica, and Fabric, including non-atomic and cluster-aware handling and explanatory notes per dialect.
  • Tests

    • Added an integration test and a helper to insert a sentinel row to verify full replace (not append) behavior.

Open with Devin

…lickHouse, Vertica, Fabric/SQL Server)

- Trino: use edr_create_table_as with drop_first=true instead of adapter.dispatch('create_table_as') with unsupported replace kwarg
- Postgres: split multi-statement transaction into separate run_query calls for post_hook compatibility
- Redshift: add dedicated dispatch with separate statements (no transaction wrapping) for post_hook compatibility
- ClickHouse: add dedicated truncate-and-insert dispatch (was falling through to default CTAS which fails without CREATE OR REPLACE support)
- Vertica: add dedicated truncate-and-insert dispatch
- Fabric/SQL Server: add fabric__ dispatch (covers sqlserver via dispatch chain) using truncate-and-insert

Fixes ELE-5298

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
@devin-ai-integration
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@linear
Copy link

linear bot commented Mar 12, 2026

@github-actions
Copy link
Contributor

👋 @devin-ai-integration[bot]
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

@coderabbitai
Copy link

coderabbitai bot commented Mar 12, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Replace single-transaction replace_table_data with dialect-specific, per-statement replace macros for Postgres, Redshift, Trino, ClickHouse, Vertica, and Fabric; add an integration test and a test macro to insert a sentinel row validating full-table replacement.

Changes

Cohort / File(s) Summary
Dialect-specific replace_table_data
macros/utils/table_operations/replace_table_data.sql
Adds per-dialect macros: postgres__replace_table_data, redshift__replace_table_data, trino__replace_table_data, clickhouse__replace_table_data, vertica__replace_table_data, fabric__replace_table_data. Moves Postgres/Redshift from atomic single-transaction replace to per-statement delete/insert (drops intermediate relation); Trino uses edr_create_table_as with drop_first; ClickHouse/Vertica/Fabric use truncate+insert. Documents transaction semantics.
Integration test
integration_tests/tests/test_dbt_artifacts/test_artifacts.py
Adds test_replace_table_data which inserts a sentinel row, runs with cache_artifacts=False to trigger replace behavior, asserts sentinel removal after replace, and verifies the real model row remains.
Test helper macro
integration_tests/dbt_project/macros/insert_sentinel_row.sql
Adds insert_sentinel_row(table_name, sentinel_alias) to insert a sentinel row (unique_id='test.sentinel') into a referenced relation for the integration test.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇 I hopped in code to clear the slate,

A sentinel vanished—no longer late,
Postgres, Trino, Redshift joined the fun,
ClickHouse, Vertica, Fabric—now one by one,
Fresh rows settle in, the replace is done.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately identifies the primary fix: restoring replace_table_data functionality across multiple broken adapters (Trino, Redshift, etc.), which aligns with the substantial changes across 6 adapter-specific implementations and integration testing.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch devin/ELE-5298-1773337700
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot]

This comment was marked as resolved.

- ClickHouse: use explicit TRUNCATE TABLE with on_cluster_clause for distributed/replicated table support
- Redshift: add data-loss warning note to comment

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
@haritamar
Copy link
Collaborator

Devin - the ticket explicitly requires to add an integration test.
Please review the requirement in the ticket and add.

Validates that replace_table_data works correctly by setting
cache_artifacts=False to force the replace path, running a dbt model,
and asserting the artifact table contains the expected data.

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
)

# The table must still contain exactly one row for the test model.
assert len(replaced) == 1, (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't that condition also hold for the "diff" method?
should we verify an actual replace happened?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point — updated in 28a3fd1. The test now changes the model's owner between two replace runs and asserts the new owner is present, proving an actual replace happened (same pattern as test_artifacts_updating). Validated locally against Postgres.

Change owner between runs and assert the new owner is present,
proving the table was actually replaced (not just appended/unchanged).

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
Copy link
Contributor Author

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

Open in Devin Review

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@integration_tests/tests/test_dbt_artifacts/test_artifacts.py`:
- Around line 60-88: Add the Dremio skip decorator to the
test_replace_table_data function to match other dbt artifacts tests: annotate
the test_replace_table_data function with `@pytest.mark.skip_targets`(["dremio"])
(ensure pytest is imported where tests live) so the test is skipped for Dremio
similarly to test_artifacts_updating.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c80d474f-d416-4549-8cb0-c8b4dd219062

📥 Commits

Reviewing files that changed from the base of the PR and between 6529f15 and 28a3fd1.

📒 Files selected for processing (1)
  • integration_tests/tests/test_dbt_artifacts/test_artifacts.py

Inserts a dummy row into dbt_models before the replace run, then
asserts it was removed — proving full table replacement (not diff).
Adds insert_sentinel_row macro for cross-adapter INSERT via run_operation.

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@integration_tests/dbt_project/macros/insert_sentinel_row.sql`:
- Around line 7-10: The SQL literal currently injects sentinel_alias directly
into the run_query call in insert_sentinel_row.sql, which breaks if the alias
contains single quotes; fix by escaping sentinel_alias before embedding (e.g.,
create a local variable like escaped_alias = sentinel_alias | replace("'", "''")
or use a dbt escape helper) and use escaped_alias in the VALUES string passed to
run_query (reference the run_query call and the sentinel_alias variable in this
macro).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 398c1007-b494-4c93-ab35-ab0023dd7004

📥 Commits

Reviewing files that changed from the base of the PR and between 28a3fd1 and acb51ff.

📒 Files selected for processing (2)
  • integration_tests/dbt_project/macros/insert_sentinel_row.sql
  • integration_tests/tests/test_dbt_artifacts/test_artifacts.py

Comment on lines +7 to +10
{% do run_query(
"INSERT INTO " ~ relation ~ " (unique_id, alias, name)"
" VALUES ('test.sentinel', '" ~ sentinel_alias ~ "', 'sentinel')"
) %}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Escape sentinel_alias before embedding it in the SQL literal.

run_operation passes raw strings into the macro, so any alias containing ' will produce invalid SQL here and make this helper flaky for future test cases.

Proposed fix
 {% macro insert_sentinel_row(table_name, sentinel_alias) %}
     {`#-` Insert a sentinel row into the given Elementary table.
         Used by integration tests to verify that replace_table_data
         actually truncates the whole table (sentinel disappears) rather
         than doing a diff-based update (sentinel would survive). -#}
     {% set relation = ref(table_name) %}
+    {% set escaped_sentinel_alias = sentinel_alias | replace("'", "''") %}
     {% do run_query(
         "INSERT INTO " ~ relation ~ " (unique_id, alias, name)"
-        " VALUES ('test.sentinel', '" ~ sentinel_alias ~ "', 'sentinel')"
+        " VALUES ('test.sentinel', '" ~ escaped_sentinel_alias ~ "', 'sentinel')"
     ) %}
 {% endmacro %}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
{% do run_query(
"INSERT INTO " ~ relation ~ " (unique_id, alias, name)"
" VALUES ('test.sentinel', '" ~ sentinel_alias ~ "', 'sentinel')"
) %}
{% macro insert_sentinel_row(table_name, sentinel_alias) %}
{`#-` Insert a sentinel row into the given Elementary table.
Used by integration tests to verify that replace_table_data
actually truncates the whole table (sentinel disappears) rather
than doing a diff-based update (sentinel would survive). -#}
{% set relation = ref(table_name) %}
{% set escaped_sentinel_alias = sentinel_alias | replace("'", "''") %}
{% do run_query(
"INSERT INTO " ~ relation ~ " (unique_id, alias, name)"
" VALUES ('test.sentinel', '" ~ escaped_sentinel_alias ~ "', 'sentinel')"
) %}
{% endmacro %}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration_tests/dbt_project/macros/insert_sentinel_row.sql` around lines 7
- 10, The SQL literal currently injects sentinel_alias directly into the
run_query call in insert_sentinel_row.sql, which breaks if the alias contains
single quotes; fix by escaping sentinel_alias before embedding (e.g., create a
local variable like escaped_alias = sentinel_alias | replace("'", "''") or use a
dbt escape helper) and use escaped_alias in the VALUES string passed to
run_query (reference the run_query call and the sentinel_alias variable in this
macro).

devin-ai-integration bot and others added 5 commits March 12, 2026 19:32
Vertica does not auto-commit DML; the INSERT was silently rolled back
when the run_operation connection closed.  Adding an explicit commit
fixes the test on Vertica (and is harmless on other adapters).

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
Vertica's dbt adapter raises 'no transaction in progress' from
adapter.commit() within run_operation context.  A raw SQL COMMIT
bypasses dbt's internal transaction tracking and is a harmless no-op
on databases that auto-commit.

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
Most adapters auto-commit run_query DML within run_operation.
Vertica is the exception — it needs an explicit COMMIT.
Spark (and similar engines) reject bare COMMIT statements.

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
…s all columns)

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>
@haritamar haritamar merged commit f820e11 into master Mar 13, 2026
27 checks passed
@haritamar haritamar deleted the devin/ELE-5298-1773337700 branch March 13, 2026 00:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant