Skip to content

feat: opt-in config to filter artifacts to current project only#968

Merged
haritamar merged 7 commits intomasterfrom
feat/upload-only-current-project-artifacts
Mar 12, 2026
Merged

feat: opt-in config to filter artifacts to current project only#968
haritamar merged 7 commits intomasterfrom
feat/upload-only-current-project-artifacts

Conversation

@haritamar
Copy link
Collaborator

@haritamar haritamar commented Mar 12, 2026

Summary

Adds an opt-in upload_only_current_project_artifacts config var (default: false) that, when enabled, filters uploaded dbt artifacts to only include those from the current project — excluding artifacts from dependency packages.

This addresses the issue raised in #793 (by @cedric-orange) where tools like dbt-loom inject nodes from multiple projects into the manifest, causing duplicate/unwanted artifacts to be uploaded.

Key design decisions (based on #793 review feedback from @elazarlachkar and @michael-myaskovsky):

  • Opt-in, not default — the flag defaults to false, preserving current behavior where dependency package artifacts are included
  • DRY helper macro — a shared filter_to_current_project(entities) macro centralizes the filtering logic, called by all 7 upload_dbt_*.sql macros
  • Uses existing patterns — leverages elementary.get_config_var() and elementary.get_project_name() already in the codebase

Usage

# dbt_project.yml
vars:
  upload_only_current_project_artifacts: true

Changes

  • New: macros/edr/dbt_artifacts/filter_to_current_project.sql — helper macro
  • Modified: All 7 upload macros (models, seeds, snapshots, sources, tests, columns, exposures) to use the helper
  • Modified: get_config_var.sql — added config var default

Supersedes #793.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features
    • Added a configurable option (off by default) to restrict artifact uploads to the current project; when enabled, models, sources, seeds, snapshots, tests, columns, and exposures uploaded are limited to the current project.
  • Chores
    • Added the new default configuration entry to keep the feature disabled by default.
  • Tests
    • Added integration tests validating default behavior and filtering when the option is enabled.

@github-actions
Copy link
Contributor

👋 @haritamar
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

@coderabbitai
Copy link

coderabbitai bot commented Mar 12, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new macro filter_to_current_project_if_needed(entities) and a default config key upload_only_current_project_artifacts: false; several dbt artifact upload macros now call the filter to optionally restrict uploaded artifacts to only those from the current project. Integration tests for this filtering behavior were added.

Changes

Cohort / File(s) Summary
New Filter Macro
macros/edr/dbt_artifacts/filter_to_current_project.sql
Adds filter_to_current_project_if_needed(entities) that checks upload_only_current_project_artifacts and returns only entities with package_name equal to the current project when enabled; otherwise returns the original entities.
Configuration Update
macros/edr/system/system_utils/get_config_var.sql
Adds default config key upload_only_current_project_artifacts: false in default__get_default_config().
Artifact Upload Macros
macros/edr/dbt_artifacts/upload_dbt_columns.sql, macros/edr/dbt_artifacts/upload_dbt_exposures.sql, macros/edr/dbt_artifacts/upload_dbt_models.sql, macros/edr/dbt_artifacts/upload_dbt_seeds.sql, macros/edr/dbt_artifacts/upload_dbt_snapshots.sql, macros/edr/dbt_artifacts/upload_dbt_sources.sql, macros/edr/dbt_artifacts/upload_dbt_tests.sql
Each macro now wraps its artifact selections with elementary.filter_to_current_project_if_needed(...), so artifacts passed to uploaders are optionally limited to the current project. No macro signatures changed.
Integration Tests
integration_tests/tests/test_dbt_artifacts/test_project_filtering.py
Adds tests validating default behavior (includes dependency artifacts) and behavior when upload_only_current_project_artifacts is enabled (excludes dependency artifacts) for models, sources, and tests. Adds TEST_MODEL constant and three test functions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I nibble through manifests, sorting with care,
A tiny flag tells me which packages to bear.
If set, I keep only what belongs to this nest,
If not, I gather them all on my quest.
A hop, a filter, and then off I go—best.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'feat: opt-in config to filter artifacts to current project only' directly and clearly describes the main change: adding an optional configuration feature to filter dbt artifacts to the current project, which aligns with the primary objective and all file changes in the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/upload-only-current-project-artifacts
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@macros/edr/dbt_artifacts/upload_dbt_tests.sql`:
- Around line 124-128: The regex used in modules.re.findall for extracting
source relation names from test_model_jinja has an accidental space in the
capture group ("(\ w+)") so source('pkg','table') refs are not matched; update
the pattern to use a proper word-character capture "(\w+)" in the second
modules.re.findall call (the one that searches for source\([...]) so
test_model_name_matches correctly includes source relation names and
primary_test_model_id can resolve for source-backed tests.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dccbce28-349d-41e8-a1c3-bc05335904c2

📥 Commits

Reviewing files that changed from the base of the PR and between a4143a5 and 11677be.

📒 Files selected for processing (9)
  • macros/edr/dbt_artifacts/filter_to_current_project.sql
  • macros/edr/dbt_artifacts/upload_dbt_columns.sql
  • macros/edr/dbt_artifacts/upload_dbt_exposures.sql
  • macros/edr/dbt_artifacts/upload_dbt_models.sql
  • macros/edr/dbt_artifacts/upload_dbt_seeds.sql
  • macros/edr/dbt_artifacts/upload_dbt_snapshots.sql
  • macros/edr/dbt_artifacts/upload_dbt_sources.sql
  • macros/edr/dbt_artifacts/upload_dbt_tests.sql
  • macros/edr/system/system_utils/get_config_var.sql

Comment on lines 124 to 128
{% set test_model_name_matches = modules.re.findall(
"ref\(['\"](\w+)['\"]\)", test_model_jinja
) + modules.re.findall(
"source\(['\"]\w+['\"], ['\"](\w+)['\"]\)", test_model_jinja
"source\(['\"]\w+['\"], ['\"](\ w+)['\"]\)", test_model_jinja
) %}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
python - <<'PY'
import re

broken = r"source\(['\"]\w+['\"], ['\"](\ w+)['\"]\)"
fixed = r"source\(['\"]\w+['\"], ['\"](\w+)['\"]\)"
sample = "source('pkg', 'orders')"

print("broken:", re.findall(broken, sample))
print("fixed :", re.findall(fixed, sample))
PY

Repository: elementary-data/dbt-data-reliability

Length of output: 109


🏁 Script executed:

cd elementary-data/dbt-data-reliability && find . -type f -name "upload_dbt_tests.sql" | head -1

Repository: elementary-data/dbt-data-reliability

Length of output: 166


🏁 Script executed:

sed -n '120,135p' "$(find . -type f -name "upload_dbt_tests.sql" | head -1)" 2>/dev/null | cat -n

Repository: elementary-data/dbt-data-reliability

Length of output: 1068


Fix the source() regex bug on line 127.

The pattern (\ w+) includes a literal space and won't match source relation names. Change it to (\w+) so that source('pkg', 'table') refs are captured correctly and primary_test_model_id resolves properly for source-backed tests.

Proposed fix
             {% set test_model_name_matches = modules.re.findall(
                 "ref\(['\"](\w+)['\"]\)", test_model_jinja
             ) + modules.re.findall(
-                "source\(['\"]\w+['\"], ['\"](\ w+)['\"]\)", test_model_jinja
+                "source\(['\"]\w+['\"], ['\"](\w+)['\"]\)", test_model_jinja
             ) %}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
{% set test_model_name_matches = modules.re.findall(
"ref\(['\"](\w+)['\"]\)", test_model_jinja
) + modules.re.findall(
"source\(['\"]\w+['\"], ['\"](\w+)['\"]\)", test_model_jinja
"source\(['\"]\w+['\"], ['\"](\ w+)['\"]\)", test_model_jinja
) %}
{% set test_model_name_matches = modules.re.findall(
"ref\(['\"](\w+)['\"]\)", test_model_jinja
) + modules.re.findall(
"source\(['\"]\w+['\"], ['\"](\w+)['\"]\)", test_model_jinja
) %}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/edr/dbt_artifacts/upload_dbt_tests.sql` around lines 124 - 128, The
regex used in modules.re.findall for extracting source relation names from
test_model_jinja has an accidental space in the capture group ("(\ w+)") so
source('pkg','table') refs are not matched; update the pattern to use a proper
word-character capture "(\w+)" in the second modules.re.findall call (the one
that searches for source\([...]) so test_model_name_matches correctly includes
source relation names and primary_test_model_id can resolve for source-backed
tests.

cedric-orange and others added 2 commits March 12, 2026 16:52
When using tools like dbt-loom, graph.nodes.values() returns nodes from
multiple projects. Filter by package_name == project_name to avoid
uploading unnecessary information from dependency packages.

Co-Authored-By: Cédric OLIVIER <76560097+cedric-orange@users.noreply.github.com>
Add `upload_only_current_project_artifacts` config var (default: false)
that, when enabled, filters graph entities by `package_name` so only
artifacts from the current dbt project are uploaded -- excluding those
from dependency packages.

A shared `filter_to_current_project` helper macro centralizes the
filtering logic and is called by all 7 upload_dbt_*.sql macros.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@haritamar haritamar force-pushed the feat/upload-only-current-project-artifacts branch from 11677be to 5061952 Compare March 12, 2026 14:53
haritamar and others added 2 commits March 12, 2026 16:57
Tests cover:
- Default behavior includes dependency package artifacts
- Filtering excludes dependency artifacts from dbt_models
- Filtering applies to sources
- Filtering applies to tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ct_if_needed

The macro conditionally filters based on a config flag, so the name
should reflect that the filtering is not always applied.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
integration_tests/tests/test_dbt_artifacts/test_project_filtering.py (2)

80-80: Remove unused tmp_path parameter.

The tmp_path fixture is declared but never used in this test function.

Proposed fix
-def test_filtering_applies_to_tests(dbt_project: DbtProject, tmp_path):
+def test_filtering_applies_to_tests(dbt_project: DbtProject):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration_tests/tests/test_dbt_artifacts/test_project_filtering.py` at line
80, The test function test_filtering_applies_to_tests declares an unused
tmp_path fixture parameter; remove tmp_path from the function signature so it
becomes def test_filtering_applies_to_tests(dbt_project: DbtProject): to avoid
unused parameter warnings and clarify the test's dependencies.

117-119: Add trailing newline at end of file.

The file is missing a trailing newline, which is a common convention for text files and helps avoid issues with some tools.

Proposed fix
         finally:
             if dbt_model_path.exists():
                 dbt_model_path.unlink()
+
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration_tests/tests/test_dbt_artifacts/test_project_filtering.py` around
lines 117 - 119, Add a trailing newline at the end of the file so the file ends
with a newline character; edit the file containing the finally block (where
dbt_model_path.unlink() is called) and ensure the last line is followed by a
single newline character.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@integration_tests/tests/test_dbt_artifacts/test_project_filtering.py`:
- Line 80: The test function test_filtering_applies_to_tests declares an unused
tmp_path fixture parameter; remove tmp_path from the function signature so it
becomes def test_filtering_applies_to_tests(dbt_project: DbtProject): to avoid
unused parameter warnings and clarify the test's dependencies.
- Around line 117-119: Add a trailing newline at the end of the file so the file
ends with a newline character; edit the file containing the finally block (where
dbt_model_path.unlink() is called) and ensure the last line is followed by a
single newline character.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 601aacc0-ae31-400a-8dcf-3896803e47bd

📥 Commits

Reviewing files that changed from the base of the PR and between 5061952 and 42102a9.

📒 Files selected for processing (9)
  • integration_tests/tests/test_dbt_artifacts/test_project_filtering.py
  • macros/edr/dbt_artifacts/filter_to_current_project.sql
  • macros/edr/dbt_artifacts/upload_dbt_columns.sql
  • macros/edr/dbt_artifacts/upload_dbt_exposures.sql
  • macros/edr/dbt_artifacts/upload_dbt_models.sql
  • macros/edr/dbt_artifacts/upload_dbt_seeds.sql
  • macros/edr/dbt_artifacts/upload_dbt_snapshots.sql
  • macros/edr/dbt_artifacts/upload_dbt_sources.sql
  • macros/edr/dbt_artifacts/upload_dbt_tests.sql
🚧 Files skipped from review as they are similar to previous changes (6)
  • macros/edr/dbt_artifacts/upload_dbt_tests.sql
  • macros/edr/dbt_artifacts/upload_dbt_exposures.sql
  • macros/edr/dbt_artifacts/upload_dbt_models.sql
  • macros/edr/dbt_artifacts/upload_dbt_columns.sql
  • macros/edr/dbt_artifacts/filter_to_current_project.sql
  • macros/edr/dbt_artifacts/upload_dbt_seeds.sql

The on-run-end hook uses diff-based upload when artifact hashes are
available, which doesn't remove pre-existing rows from dependency
packages. Switch tests to run dbt_models/dbt_tests models directly,
triggering their post_hook which does a full table replace, ensuring
a clean slate for assertions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
integration_tests/tests/test_dbt_artifacts/test_project_filtering.py (1)

87-106: Consider cleaning up the tmp directory or documenting intentional persistence.

The tmp subdirectory created at line 88 is not cleaned up in the finally block. While this is minor (using exist_ok=True allows reuse), you might want to either:

  1. Clean it up with rmdir() after unlinking the file, or
  2. Add a comment indicating the directory is intentionally left for reuse across test runs.
Optional: Clean up the directory if empty
         finally:
             if dbt_model_path.exists():
                 dbt_model_path.unlink()
+            try:
+                dbt_model_path.parent.rmdir()  # Remove tmp dir if empty
+            except OSError:
+                pass  # Directory not empty or already removed
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration_tests/tests/test_dbt_artifacts/test_project_filtering.py` around
lines 87 - 106, The tmp directory created via dbt_model_path.parent.mkdir(...)
is not removed; after unlinking dbt_model_path in the finally block, attempt to
remove the directory (e.g., call dbt_model_path.parent.rmdir() wrapped in a
try/except to ignore if not empty) or add a comment in the finally block next to
the unlink() explaining the directory is intentionally persisted for reuse;
update the finally block around dbt_model_path.unlink() to perform the rmdir()
cleanup safely (or document the choice) so the tmp folder is not left behind
unintentionally.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@integration_tests/tests/test_dbt_artifacts/test_project_filtering.py`:
- Line 66: Remove the unused tmp_path fixture from the test function signature:
edit the function definition for test_filtering_applies_to_tests to drop the
tmp_path parameter so it only accepts dbt_project (i.e., change def
test_filtering_applies_to_tests(dbt_project: DbtProject, tmp_path): to def
test_filtering_applies_to_tests(dbt_project: DbtProject):), and run tests to
confirm no other references to tmp_path remain.

---

Nitpick comments:
In `@integration_tests/tests/test_dbt_artifacts/test_project_filtering.py`:
- Around line 87-106: The tmp directory created via
dbt_model_path.parent.mkdir(...) is not removed; after unlinking dbt_model_path
in the finally block, attempt to remove the directory (e.g., call
dbt_model_path.parent.rmdir() wrapped in a try/except to ignore if not empty) or
add a comment in the finally block next to the unlink() explaining the directory
is intentionally persisted for reuse; update the finally block around
dbt_model_path.unlink() to perform the rmdir() cleanup safely (or document the
choice) so the tmp folder is not left behind unintentionally.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f15d1e3e-0139-49a6-8c62-7ebc3f91af62

📥 Commits

Reviewing files that changed from the base of the PR and between 42102a9 and 5609d64.

📒 Files selected for processing (1)
  • integration_tests/tests/test_dbt_artifacts/test_project_filtering.py

replace_table_data is broken on multiple adapters when called from a
post_hook context (Trino: invalid kwarg, Redshift: multi-statement
prepared statement, etc.). These are pre-existing bugs masked by
cache_artifacts defaulting to true.

Instead, clear the table with DELETE first, then use the on-run-end
hook with diff-based upload. The diff sees an empty table and inserts
only the filtered (current-project) artifacts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
integration_tests/tests/test_dbt_artifacts/test_project_filtering.py (1)

66-66: ⚠️ Potential issue | 🟡 Minor

Remove the unused tmp_path fixture.

Line 66 still declares tmp_path, but the test never uses it.

Proposed fix
-def test_filtering_applies_to_tests(dbt_project: DbtProject, tmp_path):
+def test_filtering_applies_to_tests(dbt_project: DbtProject):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration_tests/tests/test_dbt_artifacts/test_project_filtering.py` at line
66, The test function test_filtering_applies_to_tests has an unused fixture
parameter tmp_path; remove tmp_path from the function signature (i.e., change
def test_filtering_applies_to_tests(dbt_project: DbtProject):) so the unused
fixture is not declared.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@integration_tests/tests/test_dbt_artifacts/test_project_filtering.py`:
- Around line 21-26: Clear the dbt_models table before running and asserting to
avoid stale rows: add a step that resets/truncates the dbt_models target table
(e.g., via dbt_project.execute_sql("TRUNCATE TABLE dbt_models") or
dbt_project.reset_table("dbt_models") if available) prior to calling
dbt_project.dbt_runner.run(...) and before dbt_project.read_table("dbt_models",
raise_if_empty=True), so the assertion on package_name runs against a clean
table.

---

Duplicate comments:
In `@integration_tests/tests/test_dbt_artifacts/test_project_filtering.py`:
- Line 66: The test function test_filtering_applies_to_tests has an unused
fixture parameter tmp_path; remove tmp_path from the function signature (i.e.,
change def test_filtering_applies_to_tests(dbt_project: DbtProject):) so the
unused fixture is not declared.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4065f759-16ea-4d0b-8b47-480bc77b7650

📥 Commits

Reviewing files that changed from the base of the PR and between 5609d64 and 88e2916.

📒 Files selected for processing (1)
  • integration_tests/tests/test_dbt_artifacts/test_project_filtering.py

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@haritamar haritamar merged commit 3c6e7bb into master Mar 12, 2026
27 checks passed
@haritamar haritamar deleted the feat/upload-only-current-project-artifacts branch March 12, 2026 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants