feat: opt-in config to filter artifacts to current project only#968
feat: opt-in config to filter artifacts to current project only#968
Conversation
|
👋 @haritamar |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a new macro Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@macros/edr/dbt_artifacts/upload_dbt_tests.sql`:
- Around line 124-128: The regex used in modules.re.findall for extracting
source relation names from test_model_jinja has an accidental space in the
capture group ("(\ w+)") so source('pkg','table') refs are not matched; update
the pattern to use a proper word-character capture "(\w+)" in the second
modules.re.findall call (the one that searches for source\([...]) so
test_model_name_matches correctly includes source relation names and
primary_test_model_id can resolve for source-backed tests.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: dccbce28-349d-41e8-a1c3-bc05335904c2
📒 Files selected for processing (9)
macros/edr/dbt_artifacts/filter_to_current_project.sqlmacros/edr/dbt_artifacts/upload_dbt_columns.sqlmacros/edr/dbt_artifacts/upload_dbt_exposures.sqlmacros/edr/dbt_artifacts/upload_dbt_models.sqlmacros/edr/dbt_artifacts/upload_dbt_seeds.sqlmacros/edr/dbt_artifacts/upload_dbt_snapshots.sqlmacros/edr/dbt_artifacts/upload_dbt_sources.sqlmacros/edr/dbt_artifacts/upload_dbt_tests.sqlmacros/edr/system/system_utils/get_config_var.sql
| {% set test_model_name_matches = modules.re.findall( | ||
| "ref\(['\"](\w+)['\"]\)", test_model_jinja | ||
| ) + modules.re.findall( | ||
| "source\(['\"]\w+['\"], ['\"](\w+)['\"]\)", test_model_jinja | ||
| "source\(['\"]\w+['\"], ['\"](\ w+)['\"]\)", test_model_jinja | ||
| ) %} |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
python - <<'PY'
import re
broken = r"source\(['\"]\w+['\"], ['\"](\ w+)['\"]\)"
fixed = r"source\(['\"]\w+['\"], ['\"](\w+)['\"]\)"
sample = "source('pkg', 'orders')"
print("broken:", re.findall(broken, sample))
print("fixed :", re.findall(fixed, sample))
PYRepository: elementary-data/dbt-data-reliability
Length of output: 109
🏁 Script executed:
cd elementary-data/dbt-data-reliability && find . -type f -name "upload_dbt_tests.sql" | head -1Repository: elementary-data/dbt-data-reliability
Length of output: 166
🏁 Script executed:
sed -n '120,135p' "$(find . -type f -name "upload_dbt_tests.sql" | head -1)" 2>/dev/null | cat -nRepository: elementary-data/dbt-data-reliability
Length of output: 1068
Fix the source() regex bug on line 127.
The pattern (\ w+) includes a literal space and won't match source relation names. Change it to (\w+) so that source('pkg', 'table') refs are captured correctly and primary_test_model_id resolves properly for source-backed tests.
Proposed fix
{% set test_model_name_matches = modules.re.findall(
"ref\(['\"](\w+)['\"]\)", test_model_jinja
) + modules.re.findall(
- "source\(['\"]\w+['\"], ['\"](\ w+)['\"]\)", test_model_jinja
+ "source\(['\"]\w+['\"], ['\"](\w+)['\"]\)", test_model_jinja
) %}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| {% set test_model_name_matches = modules.re.findall( | |
| "ref\(['\"](\w+)['\"]\)", test_model_jinja | |
| ) + modules.re.findall( | |
| "source\(['\"]\w+['\"], ['\"](\w+)['\"]\)", test_model_jinja | |
| "source\(['\"]\w+['\"], ['\"](\ w+)['\"]\)", test_model_jinja | |
| ) %} | |
| {% set test_model_name_matches = modules.re.findall( | |
| "ref\(['\"](\w+)['\"]\)", test_model_jinja | |
| ) + modules.re.findall( | |
| "source\(['\"]\w+['\"], ['\"](\w+)['\"]\)", test_model_jinja | |
| ) %} |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@macros/edr/dbt_artifacts/upload_dbt_tests.sql` around lines 124 - 128, The
regex used in modules.re.findall for extracting source relation names from
test_model_jinja has an accidental space in the capture group ("(\ w+)") so
source('pkg','table') refs are not matched; update the pattern to use a proper
word-character capture "(\w+)" in the second modules.re.findall call (the one
that searches for source\([...]) so test_model_name_matches correctly includes
source relation names and primary_test_model_id can resolve for source-backed
tests.
When using tools like dbt-loom, graph.nodes.values() returns nodes from multiple projects. Filter by package_name == project_name to avoid uploading unnecessary information from dependency packages. Co-Authored-By: Cédric OLIVIER <76560097+cedric-orange@users.noreply.github.com>
Add `upload_only_current_project_artifacts` config var (default: false) that, when enabled, filters graph entities by `package_name` so only artifacts from the current dbt project are uploaded -- excluding those from dependency packages. A shared `filter_to_current_project` helper macro centralizes the filtering logic and is called by all 7 upload_dbt_*.sql macros. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
11677be to
5061952
Compare
Tests cover: - Default behavior includes dependency package artifacts - Filtering excludes dependency artifacts from dbt_models - Filtering applies to sources - Filtering applies to tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ct_if_needed The macro conditionally filters based on a config flag, so the name should reflect that the filtering is not always applied. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (2)
integration_tests/tests/test_dbt_artifacts/test_project_filtering.py (2)
80-80: Remove unusedtmp_pathparameter.The
tmp_pathfixture is declared but never used in this test function.Proposed fix
-def test_filtering_applies_to_tests(dbt_project: DbtProject, tmp_path): +def test_filtering_applies_to_tests(dbt_project: DbtProject):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@integration_tests/tests/test_dbt_artifacts/test_project_filtering.py` at line 80, The test function test_filtering_applies_to_tests declares an unused tmp_path fixture parameter; remove tmp_path from the function signature so it becomes def test_filtering_applies_to_tests(dbt_project: DbtProject): to avoid unused parameter warnings and clarify the test's dependencies.
117-119: Add trailing newline at end of file.The file is missing a trailing newline, which is a common convention for text files and helps avoid issues with some tools.
Proposed fix
finally: if dbt_model_path.exists(): dbt_model_path.unlink() +🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@integration_tests/tests/test_dbt_artifacts/test_project_filtering.py` around lines 117 - 119, Add a trailing newline at the end of the file so the file ends with a newline character; edit the file containing the finally block (where dbt_model_path.unlink() is called) and ensure the last line is followed by a single newline character.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@integration_tests/tests/test_dbt_artifacts/test_project_filtering.py`:
- Line 80: The test function test_filtering_applies_to_tests declares an unused
tmp_path fixture parameter; remove tmp_path from the function signature so it
becomes def test_filtering_applies_to_tests(dbt_project: DbtProject): to avoid
unused parameter warnings and clarify the test's dependencies.
- Around line 117-119: Add a trailing newline at the end of the file so the file
ends with a newline character; edit the file containing the finally block (where
dbt_model_path.unlink() is called) and ensure the last line is followed by a
single newline character.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 601aacc0-ae31-400a-8dcf-3896803e47bd
📒 Files selected for processing (9)
integration_tests/tests/test_dbt_artifacts/test_project_filtering.pymacros/edr/dbt_artifacts/filter_to_current_project.sqlmacros/edr/dbt_artifacts/upload_dbt_columns.sqlmacros/edr/dbt_artifacts/upload_dbt_exposures.sqlmacros/edr/dbt_artifacts/upload_dbt_models.sqlmacros/edr/dbt_artifacts/upload_dbt_seeds.sqlmacros/edr/dbt_artifacts/upload_dbt_snapshots.sqlmacros/edr/dbt_artifacts/upload_dbt_sources.sqlmacros/edr/dbt_artifacts/upload_dbt_tests.sql
🚧 Files skipped from review as they are similar to previous changes (6)
- macros/edr/dbt_artifacts/upload_dbt_tests.sql
- macros/edr/dbt_artifacts/upload_dbt_exposures.sql
- macros/edr/dbt_artifacts/upload_dbt_models.sql
- macros/edr/dbt_artifacts/upload_dbt_columns.sql
- macros/edr/dbt_artifacts/filter_to_current_project.sql
- macros/edr/dbt_artifacts/upload_dbt_seeds.sql
The on-run-end hook uses diff-based upload when artifact hashes are available, which doesn't remove pre-existing rows from dependency packages. Switch tests to run dbt_models/dbt_tests models directly, triggering their post_hook which does a full table replace, ensuring a clean slate for assertions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
integration_tests/tests/test_dbt_artifacts/test_project_filtering.py (1)
87-106: Consider cleaning up thetmpdirectory or documenting intentional persistence.The
tmpsubdirectory created at line 88 is not cleaned up in the finally block. While this is minor (usingexist_ok=Trueallows reuse), you might want to either:
- Clean it up with
rmdir()after unlinking the file, or- Add a comment indicating the directory is intentionally left for reuse across test runs.
Optional: Clean up the directory if empty
finally: if dbt_model_path.exists(): dbt_model_path.unlink() + try: + dbt_model_path.parent.rmdir() # Remove tmp dir if empty + except OSError: + pass # Directory not empty or already removed🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@integration_tests/tests/test_dbt_artifacts/test_project_filtering.py` around lines 87 - 106, The tmp directory created via dbt_model_path.parent.mkdir(...) is not removed; after unlinking dbt_model_path in the finally block, attempt to remove the directory (e.g., call dbt_model_path.parent.rmdir() wrapped in a try/except to ignore if not empty) or add a comment in the finally block next to the unlink() explaining the directory is intentionally persisted for reuse; update the finally block around dbt_model_path.unlink() to perform the rmdir() cleanup safely (or document the choice) so the tmp folder is not left behind unintentionally.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@integration_tests/tests/test_dbt_artifacts/test_project_filtering.py`:
- Line 66: Remove the unused tmp_path fixture from the test function signature:
edit the function definition for test_filtering_applies_to_tests to drop the
tmp_path parameter so it only accepts dbt_project (i.e., change def
test_filtering_applies_to_tests(dbt_project: DbtProject, tmp_path): to def
test_filtering_applies_to_tests(dbt_project: DbtProject):), and run tests to
confirm no other references to tmp_path remain.
---
Nitpick comments:
In `@integration_tests/tests/test_dbt_artifacts/test_project_filtering.py`:
- Around line 87-106: The tmp directory created via
dbt_model_path.parent.mkdir(...) is not removed; after unlinking dbt_model_path
in the finally block, attempt to remove the directory (e.g., call
dbt_model_path.parent.rmdir() wrapped in a try/except to ignore if not empty) or
add a comment in the finally block next to the unlink() explaining the directory
is intentionally persisted for reuse; update the finally block around
dbt_model_path.unlink() to perform the rmdir() cleanup safely (or document the
choice) so the tmp folder is not left behind unintentionally.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f15d1e3e-0139-49a6-8c62-7ebc3f91af62
📒 Files selected for processing (1)
integration_tests/tests/test_dbt_artifacts/test_project_filtering.py
integration_tests/tests/test_dbt_artifacts/test_project_filtering.py
Outdated
Show resolved
Hide resolved
replace_table_data is broken on multiple adapters when called from a post_hook context (Trino: invalid kwarg, Redshift: multi-statement prepared statement, etc.). These are pre-existing bugs masked by cache_artifacts defaulting to true. Instead, clear the table with DELETE first, then use the on-run-end hook with diff-based upload. The diff sees an empty table and inserts only the filtered (current-project) artifacts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
integration_tests/tests/test_dbt_artifacts/test_project_filtering.py (1)
66-66:⚠️ Potential issue | 🟡 MinorRemove the unused
tmp_pathfixture.Line 66 still declares
tmp_path, but the test never uses it.Proposed fix
-def test_filtering_applies_to_tests(dbt_project: DbtProject, tmp_path): +def test_filtering_applies_to_tests(dbt_project: DbtProject):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@integration_tests/tests/test_dbt_artifacts/test_project_filtering.py` at line 66, The test function test_filtering_applies_to_tests has an unused fixture parameter tmp_path; remove tmp_path from the function signature (i.e., change def test_filtering_applies_to_tests(dbt_project: DbtProject):) so the unused fixture is not declared.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@integration_tests/tests/test_dbt_artifacts/test_project_filtering.py`:
- Around line 21-26: Clear the dbt_models table before running and asserting to
avoid stale rows: add a step that resets/truncates the dbt_models target table
(e.g., via dbt_project.execute_sql("TRUNCATE TABLE dbt_models") or
dbt_project.reset_table("dbt_models") if available) prior to calling
dbt_project.dbt_runner.run(...) and before dbt_project.read_table("dbt_models",
raise_if_empty=True), so the assertion on package_name runs against a clean
table.
---
Duplicate comments:
In `@integration_tests/tests/test_dbt_artifacts/test_project_filtering.py`:
- Line 66: The test function test_filtering_applies_to_tests has an unused
fixture parameter tmp_path; remove tmp_path from the function signature (i.e.,
change def test_filtering_applies_to_tests(dbt_project: DbtProject):) so the
unused fixture is not declared.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4065f759-16ea-4d0b-8b47-480bc77b7650
📒 Files selected for processing (1)
integration_tests/tests/test_dbt_artifacts/test_project_filtering.py
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
Adds an opt-in
upload_only_current_project_artifactsconfig var (default:false) that, when enabled, filters uploaded dbt artifacts to only include those from the current project — excluding artifacts from dependency packages.This addresses the issue raised in #793 (by @cedric-orange) where tools like dbt-loom inject nodes from multiple projects into the manifest, causing duplicate/unwanted artifacts to be uploaded.
Key design decisions (based on #793 review feedback from @elazarlachkar and @michael-myaskovsky):
false, preserving current behavior where dependency package artifacts are includedfilter_to_current_project(entities)macro centralizes the filtering logic, called by all 7upload_dbt_*.sqlmacroselementary.get_config_var()andelementary.get_project_name()already in the codebaseUsage
Changes
macros/edr/dbt_artifacts/filter_to_current_project.sql— helper macroget_config_var.sql— added config var defaultSupersedes #793.
🤖 Generated with Claude Code
Summary by CodeRabbit