feat: add BigQuery partitioning and clustering to data_monitoring_metrics by devin-ai-integration[bot] · Pull Request #972 · elementary-data/dbt-data-reliability

devin-ai-integration · 2026-03-16T13:05:24Z

Summary

Addresses high BigQuery scan costs on data_monitoring_metrics by adding partitioning (on bucket_end) and clustering (on full_table_name, metric_name). This aligns with the hot-path query filters on this table.

Changes:

get_partition_by(column="created_at") — Renamed from get_default_partition_by with a new column parameter. A backward-compatible alias get_default_partition_by is retained so existing top-level references keep working.
get_cluster_by(columns) — New macro using the same adapter dispatch pattern. Returns the columns list on BigQuery, none elsewhere. Controlled by a separate bigquery_disable_clustering config var.
data_monitoring_metrics — Now configured with partition_by=bucket_end and cluster_by=[full_table_name, metric_name].
dbt_run_results / dbt_invocations — Updated to call get_partition_by() (no behavior change; default column is still created_at).
Integration tests — Two new BigQuery-only tests verifying partition and clustering columns via INFORMATION_SCHEMA.

Review & Testing Checklist for Human

Backward compatibility of adapter dispatch: The old dispatched names (bigquery__get_default_partition_by, default__get_default_partition_by) are replaced with bigquery__get_partition_by / default__get_partition_by. Users who have overridden the adapter-specific macros in their own dbt project will need to rename their overrides. Evaluate whether this is acceptable or if dispatched aliases are also needed.
--full-refresh migration: Existing data_monitoring_metrics tables will not gain partitioning/clustering until rebuilt with --full-refresh. Confirm this will be documented in release notes.
Verify BigQuery CI passes: The new integration tests (test_data_monitoring_metrics_partitioned, test_data_monitoring_metrics_clustered) can only be validated against a real BigQuery instance. Confirm they pass in CI.
Partition column choice: bucket_end was chosen over created_at because both hot-path queries filter on bucket_end. Verify this matches the actual query patterns in the anomaly detection macros.

Notes

The unique_key='id' merge strategy on data_monitoring_metrics is unaffected — partition key controls scan pruning only, not merge logic.
cluster_by is a no-op on non-BigQuery adapters (returns none), and partition_by continues to return none on non-BigQuery adapters as before.
Resolves ELE-5300.

Link to Devin session: https://app.devin.ai/sessions/b30f1f8be6ff4ca3b48b98fe0689c60b
Requested by: @haritamar

Summary by CodeRabbit

New Features
- Added configurable table clustering for BigQuery with an option to disable it.
- Enhanced table partitioning with improved parameterized configuration control.
Tests
- Added tests to verify BigQuery partitioning and clustering behavior on data tables.

…rics - Rename get_default_partition_by -> get_partition_by(column='created_at') with a column parameter so models can specify their partition column. A backward-compatible alias get_default_partition_by is kept. - Create new get_cluster_by(columns) macro with adapter dispatch pattern. Returns columns list on BigQuery, none on other adapters. Controlled by a separate bigquery_disable_clustering flag. - Apply partition_by (on bucket_end) and cluster_by (full_table_name, metric_name) to data_monitoring_metrics. - Update existing callers (dbt_run_results, dbt_invocations) to use the renamed get_partition_by macro. - Add integration tests for data_monitoring_metrics partitioning and clustering on BigQuery. Resolves ELE-5300 Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>

devin-ai-integration · 2026-03-16T13:05:27Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

linear · 2026-03-16T13:05:28Z

ELE-5300 Add BigQuery partitioning and clustering to data_monitoring_metrics

github-actions · 2026-03-16T13:05:35Z

👋 @devin-ai-integration[bot]
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

coderabbitai · 2026-03-16T13:05:39Z

📝 Walkthrough

Walkthrough

The changes introduce BigQuery clustering functionality through a new get_cluster_by macro with a configurable disable flag, refactor existing partition_by macros to accept explicit column parameters, and update models to use the new partitioning and clustering configurations. Integration tests verify BigQuery partitioning and clustering behavior on the data_monitoring_metrics table.

Changes

Cohort / File(s)	Summary
Partitioning and Clustering Macros `macros/utils/cross_db_utils/cluster_by.sql`, `macros/utils/cross_db_utils/partition_by.sql`, `macros/edr/system/system_utils/get_config_var.sql`	Introduces `get_cluster_by` macro with BigQuery-specific implementation that respects a `bigquery_disable_clustering` config flag. Refactors `partition_by` macros to accept explicit column parameters via `get_partition_by(column)` and maintains backward compatibility through `get_default_partition_by()` wrapper.
Model Configuration Updates `models/edr/data_monitoring/data_monitoring/data_monitoring_metrics.sql`, `models/edr/dbt_artifacts/dbt_invocations.sql`, `models/edr/dbt_artifacts/dbt_run_results.sql`	Updates incremental models to use new `get_partition_by()` macro instead of deprecated `get_default_partition_by()`. Adds partition_by and cluster_by configuration to data_monitoring_metrics with clustering on `full_table_name` and `metric_name`, partitioning on `bucket_end`.
Integration Tests `integration_tests/tests/test_dbt_artifacts/test_artifacts.py`	Adds two BigQuery-specific tests: `test_data_monitoring_metrics_partitioned` and `test_data_monitoring_metrics_clustered` that verify partitioning and clustering via INFORMATION_SCHEMA queries. Updates context in existing partition test comment.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Hop-hop, the clusters align,
Buckets and metrics in rows so fine,
BigQuery now dances with partitions neat,
A toggleable feature makes cleanup complete!
Tests verify the magic we've made,
In the garden of data, our seeds are laid. 🌱

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: adding BigQuery partitioning and clustering to data_monitoring_metrics, which is the primary objective and focus of the changeset.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch devin/ELE-5300-1773666092

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

integration_tests/tests/test_dbt_artifacts/test_artifacts.py (1)
260-293: LGTM!

The new tests correctly verify BigQuery-specific partitioning and clustering behavior using INFORMATION_SCHEMA.COLUMNS. The query for clustering columns properly orders by clustering_ordinal_position, and assertions match the model configuration.

Minor note: Both tests perform full_refresh=True on data_monitoring_metrics. While this ensures test isolation (which is good), you could consider combining them into a single test to reduce CI runtime if BigQuery test runs are slow.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration_tests/tests/test_dbt_artifacts/test_artifacts.py` around lines
260 - 293, Two tests both call
dbt_project.dbt_runner.run(select="data_monitoring_metrics", full_refresh=True)
causing duplicate full-refresh work; combine
test_data_monitoring_metrics_partitioned and
test_data_monitoring_metrics_clustered into a single test (e.g.,
test_data_monitoring_metrics_partitioning_and_clustering) that runs the
full_refresh once, then queries INFORMATION_SCHEMA.COLUMNS twice (once for
is_partitioning_column and once for clustering_ordinal_position ORDER BY
clustering_ordinal_position) and asserts both expected column lists so CI
runtime is reduced.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@integration_tests/tests/test_dbt_artifacts/test_artifacts.py`:
- Around line 260-293: Two tests both call
dbt_project.dbt_runner.run(select="data_monitoring_metrics", full_refresh=True)
causing duplicate full-refresh work; combine
test_data_monitoring_metrics_partitioned and
test_data_monitoring_metrics_clustered into a single test (e.g.,
test_data_monitoring_metrics_partitioning_and_clustering) that runs the
full_refresh once, then queries INFORMATION_SCHEMA.COLUMNS twice (once for
is_partitioning_column and once for clustering_ordinal_position ORDER BY
clustering_ordinal_position) and asserts both expected column lists so CI
runtime is reduced.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 59769bc7-f120-4483-b9f4-34aacffe98c0

📥 Commits

Reviewing files that changed from the base of the PR and between f820e11 and abefe28.

📒 Files selected for processing (7)

integration_tests/tests/test_dbt_artifacts/test_artifacts.py
macros/edr/system/system_utils/get_config_var.sql
macros/utils/cross_db_utils/cluster_by.sql
macros/utils/cross_db_utils/partition_by.sql
models/edr/data_monitoring/data_monitoring/data_monitoring_metrics.sql
models/edr/dbt_artifacts/dbt_invocations.sql
models/edr/dbt_artifacts/dbt_run_results.sql

devin-ai-integration bot assigned haritamar Mar 16, 2026

coderabbitai bot reviewed Mar 16, 2026

View reviewed changes

haritamar approved these changes Mar 16, 2026

View reviewed changes

haritamar merged commit 7a2b542 into master Mar 16, 2026
28 checks passed

haritamar deleted the devin/ELE-5300-1773666092 branch March 16, 2026 14:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add BigQuery partitioning and clustering to data_monitoring_metrics#972

feat: add BigQuery partitioning and clustering to data_monitoring_metrics#972
haritamar merged 1 commit intomasterfrom
devin/ELE-5300-1773666092

devin-ai-integration bot commented Mar 16, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

devin-ai-integration bot commented Mar 16, 2026

Uh oh!

linear bot commented Mar 16, 2026

Uh oh!

github-actions bot commented Mar 16, 2026

Uh oh!

coderabbitai bot commented Mar 16, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

devin-ai-integration bot commented Mar 16, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Review & Testing Checklist for Human

Notes

Summary by CodeRabbit

Uh oh!

devin-ai-integration bot commented Mar 16, 2026

🤖 Devin AI Engineer

Uh oh!

linear bot commented Mar 16, 2026

Uh oh!

github-actions bot commented Mar 16, 2026

Uh oh!

coderabbitai bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

devin-ai-integration bot commented Mar 16, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 16, 2026 •

edited

Loading