diff --git a/docs/Dockerfile b/docs/Dockerfile index 8503e9921..1ad7cc928 100644 --- a/docs/Dockerfile +++ b/docs/Dockerfile @@ -1,4 +1,4 @@ -FROM node:23 +FROM node:20.3.0 WORKDIR /app RUN npm i -g mintlify diff --git a/docs/_snippets/cloud/integrations/athena.mdx b/docs/_snippets/cloud/integrations/athena.mdx deleted file mode 100644 index 6863c0981..000000000 --- a/docs/_snippets/cloud/integrations/athena.mdx +++ /dev/null @@ -1 +0,0 @@ -Coming soon! diff --git a/docs/_snippets/cloud/integrations/cards-groups/cloud-integrations-cards.mdx b/docs/_snippets/cloud/integrations/cards-groups/cloud-integrations-cards.mdx deleted file mode 100644 index 121020434..000000000 --- a/docs/_snippets/cloud/integrations/cards-groups/cloud-integrations-cards.mdx +++ /dev/null @@ -1,23 +0,0 @@ -### Data warehouses - - - -### Transformation and orchestration - - - -### Data visualization - - - -### Reverse ETL - - - -### Code repositories - - - -### Alerts & incidents - - \ No newline at end of file diff --git a/docs/_snippets/cloud/integrations/cards-groups/connect-dwh-cards.mdx b/docs/_snippets/cloud/integrations/cards-groups/connect-dwh-cards.mdx deleted file mode 100644 index ba75ceccb..000000000 --- a/docs/_snippets/cloud/integrations/cards-groups/connect-dwh-cards.mdx +++ /dev/null @@ -1,245 +0,0 @@ - - - - - - - - - - - } - > - - - - - - - - - - - - - } - > - - - - - - - - - - } - > - - - - } - > - - - - - - - - - - - - } - > - - - - } - > - Click for details - - - - - } - > - Click for details - - - diff --git a/docs/_snippets/cloud/integrations/clickhouse.mdx b/docs/_snippets/cloud/integrations/clickhouse.mdx deleted file mode 100644 index ee7f614fa..000000000 --- a/docs/_snippets/cloud/integrations/clickhouse.mdx +++ /dev/null @@ -1,15 +0,0 @@ -You will connect Elementary Cloud to Clickhouse for syncing the Elementary schema (created by the [Elementary dbt package](/cloud/onboarding/quickstart-dbt-package)). - - - -### Fill the connection form - -Provide the following fields: - -- **Host**: The hostname of your Clickhouse account to connect to. This can either be a hostname or an IP address. -- **Port**: The port of your Clickhouse account to connect to. -- **Elementary schema**: The name of your Elementary schema. Usually `[schema name]_elementary`. -- **User**: The name of the for Elementary user. -- **Password**: The password associated with the provided user. - - diff --git a/docs/_snippets/cloud/integrations/databricks.mdx b/docs/_snippets/cloud/integrations/databricks.mdx deleted file mode 100644 index 04dd98f97..000000000 --- a/docs/_snippets/cloud/integrations/databricks.mdx +++ /dev/null @@ -1,17 +0,0 @@ -You will connect Elementary Cloud to Databricks for syncing the Elementary schema (created by the [Elementary dbt package](/cloud/onboarding/quickstart-dbt-package)). - - - - - -### Fill the connection form - -Provide the following fields: - -- **Host**: The hostname of your Databricks account to connect to. -- **Http path**: The path to the Databricks cluster or SQL warehouse. -- **Token**: The token you generated for Elementary. For more information, see [Generate a token](https://docs.databricks.com/dev-tools/api/latest/authentication.html#generate-a-token) in the Databricks docs. -- **Catalog (optional)**: The name of the Databricks Catalog. -- **Elementary schema**: The name of your Elementary schema. Usually `[schema name]_elementary`. - - diff --git a/docs/_snippets/dwh/bigquery/cli_service_account.mdx b/docs/_snippets/dwh/bigquery/cli_service_account.mdx deleted file mode 100644 index 2d66aba18..000000000 --- a/docs/_snippets/dwh/bigquery/cli_service_account.mdx +++ /dev/null @@ -1,3 +0,0 @@ - - - diff --git a/docs/_snippets/dwh/bigquery/cloud_service_account.mdx b/docs/_snippets/dwh/bigquery/cloud_service_account.mdx deleted file mode 100644 index 703186db4..000000000 --- a/docs/_snippets/dwh/bigquery/cloud_service_account.mdx +++ /dev/null @@ -1,4 +0,0 @@ - - - - diff --git a/docs/_snippets/dwh/databricks/create_service_principal.mdx b/docs/_snippets/dwh/databricks/create_service_principal.mdx deleted file mode 100644 index 094aa060c..000000000 --- a/docs/_snippets/dwh/databricks/create_service_principal.mdx +++ /dev/null @@ -1,37 +0,0 @@ -### Create service principal - -1. In your Databrick console, go to the admin settings by clicking your username in the to right corner -> Admin settings (add photo) - -Admin settings - -2. Go to the Service principals tab, then click Add service principal (add photo) - -Service principal settings - -3. Give the service principal a good name (e.g elementary) and click Add (add photo) - -Add service principal - -4. Then, from the service principal configuration view, copy the Application Id (add photo) - -Service principal ID - -4. Finally, run the following query: - -``` -GRANT SELECT ON SCHEMA TO ``; -``` - -Make sure to replace the `` and `` placeholders with the correct values diff --git a/docs/_snippets/faq/question-schema.mdx b/docs/_snippets/faq/question-schema.mdx deleted file mode 100644 index 5a80fc069..000000000 --- a/docs/_snippets/faq/question-schema.mdx +++ /dev/null @@ -1,5 +0,0 @@ - - - - - diff --git a/docs/_snippets/faq/question-test-results-sample.mdx b/docs/_snippets/faq/question-test-results-sample.mdx deleted file mode 100644 index d80e2f918..000000000 --- a/docs/_snippets/faq/question-test-results-sample.mdx +++ /dev/null @@ -1,22 +0,0 @@ - - -Yes you can! - -Elementary saves samples of failed test rows and stores them in the table `test_result_rows`, then displays them in the *Results* tab of the report. - -By default, Elementary saves 5 rows per test, but you can change this number by setting the variable `test_sample_row_count` to the number of rows you want to save. For example, to save 10 rows per test, add the following to your `dbt_project.yml` file: - -```yaml -vars: - test_sample_row_count: 10 -``` - -Or use the `--vars` flag when you run `dbt test`: - -```shell -dbt test --vars '{"test_sample_row_count": 10}' -``` - -***NOTE***: The larger the number of rows you save, the more data you will store in your database. This can affect the performance and cost, depending on your database. - - diff --git a/docs/_snippets/oss/oss-introduction-opening.mdx b/docs/_snippets/oss/oss-introduction-opening.mdx deleted file mode 100644 index 3574e16e5..000000000 --- a/docs/_snippets/oss/oss-introduction-opening.mdx +++ /dev/null @@ -1 +0,0 @@ -Elementary OSS is a CLI tool you can deploy and orchestrate to send Slack alerts and self-host the Elementary report. diff --git a/docs/_snippets/oss/oss-introduction.mdx b/docs/_snippets/oss/oss-introduction.mdx deleted file mode 100644 index ae893786b..000000000 --- a/docs/_snippets/oss/oss-introduction.mdx +++ /dev/null @@ -1,40 +0,0 @@ -### CLI Guides - - - - - - - - -
- - - Demo - - -### Supported adapters - - diff --git a/docs/_snippets/products-cards.mdx b/docs/_snippets/products-cards.mdx deleted file mode 100644 index afd9cefcb..000000000 --- a/docs/_snippets/products-cards.mdx +++ /dev/null @@ -1,22 +0,0 @@ -Read about the key features and product offerings: - - - - - - diff --git a/docs/changelog.mdx b/docs/changelog.mdx index 52ff9d81d..7dade9393 100644 --- a/docs/changelog.mdx +++ b/docs/changelog.mdx @@ -1,183 +1,589 @@ --- -title: Changelog -description: "See what's new at Elementary" +title: Elementary Changelog +description: "See what's new on the Elementary Cloud Platform" --- -## May 2024 - -### Test configuration supports ANY dbt TEST OUT THERE! 😱 - -
-
-
- Test configuration -
-
- Our new YAML editor allows you to add ANY dbt test that exists in the dbt ecosystem, in bulk and directly from the UI! - - It supports any dbt expectation or utils test out of the box, and you can use it to add your own custom generic tests to tables and columns. +export const Tag = ({ children = "Text goes here", type }) => ( +
+ {children} +
+); - +export const Tags = ({ date, tags = [] }) => ( +
+ {date} + {tags.map((tag, i) => ( + {tag} + ))} +
+); -
-
- -
- -### Detection of column anomalies by dimensions - -
-
-
- Anomaly detection -
-
- You can now add a new parameter to your column anomaly tests - `dimensions`. - - This will calculate the column metrics for every time bucket and table dimension values, allowing you to detect anomalies in specific segments of your data. - - For example, if you want to detect anomalies in a revenue column and you have multiple apps in different countries - now you can detect anomalies in revenue in a specific country. - - Here is an example of how this can be configured- - - ```yaml - models: - .... - columns: - - name: in_app_purchase_revenue - tests: - - elementary.column_anomalies: - column_anomalies: - - sum - dimensions: - - app_name - - country - ``` - -
-
- -
- -### Column level lineage ENRICHED with TEST RESULTS - -
-
-
- Lineage -
-
- We are excited to be the first (and only) tool in the industry that lets investigate test results right on top of your column level lineage graph. With this new release you can filter the lineage graph on a specific column that has an issue, and see if upstream or downstream columns or BI assets have similar test failures to understand the root cause fast. - - +## Critical Assets + + + +We’re excited to introduce the **Critical Assets** feature, designed to help you prioritize and protect your most important data assets. + +**What is a Critical Asset?** + +A critical asset is ***any*** data asset (such as a model, exposure, or report) that plays a crucial role in your ***company's*** data ecosystem. Issues affecting these assets can have a significant impact on business operations, dashboards, and decision-making. + +Marking an asset as **critical** ensures it receives **higher priority in monitoring and alerting**, helping you quickly identify and respond to issues that may impact it. + +Once an asset is marked as **critical**, you will be able to: + +✅ **Identify it in the UI**, where it will be visually highlighted. + +✅ **Receive alerts** when upstream issues may impact the critical asset. + +✅ **Filter incidents** by their impact on critical assets. + +Learn more about how to use Critical Assets in our [docs.](https://docs.elementary-data.com/features/data-governance/critical_assets) + +![](https://res.cloudinary.com/diuctyblm/image/upload/v1740079005/Changelog/Critical_Assets_3_gjw9nc.gif) + + + +## Mute tests + + + +We’re pleased to introduce the Mute Test feature! + +This allows you to run tests without triggering alerts, giving you greater control over notifications while still monitoring your data. It’s perfect for scenarios where you’re testing new data sets, refining thresholds, or adjusting test logic—without unnecessary noise. + +With this feature, you can ensure everything is working as expected before enabling alerts, keeping your team focused and informed only when it truly matters. + +Learn more how to [mute tests](https://docs.elementary-data.com/features/alerts-and-incidents/alerts-and-incidents-overview) in our docs. + +![](https://res.cloudinary.com/diuctyblm/image/upload/v1740079310/Changelog/Mute_Tests_gif_omflcw.gif) + + +## Custom metadata + + + +Custom attributes from ****[dbt’s meta field](https://docs.getdbt.com/reference/resource-configs/meta) are now visible in the Elementary catalog, enhancing context and improving collaboration by bringing key metadata directly into your observability workflows. + +We understand that not all meta attributes are relevant for every team. If there are specific meta attributes you’d like to see in the catalog, please reach out to us at Elementary. Let us know your preferences, and we’ll configure the catalog to display the metadata most valuable to you. + + +![](https://res.cloudinary.com/diuctyblm/image/upload/v1740079225/Changelog/Fields_j6jumx.png) + +## Manually Set SLAs for Freshness Tests + + + +You can now set a manual threshold for Elementary's freshness tests. + +While our automated freshness test uses anomaly detection to identify unusual delays in table updates, sometimes you need more precise control. With manual thresholds, you can explicitly define when a freshness test should fail, giving you full control over monitoring your data freshness requirements. Simply set your desired threshold, and you'll be notified whenever a table hasn't refreshed within that time limit. + +Learn more about [automated freshness tests](https://docs.elementary-data.com/features/anomaly-detection/automated-freshness) in our docs. + +![](https://res.cloudinary.com/diuctyblm/image/upload/v1740079192/Changelog/image_55_ktvski.png) + +## Connect Multiple BI Tools in Lineage + + + +Elementary now supports connecting multiple BI tools, bringing even more visibility into your data. Once connected, BI metadata will appear in both the **catalog** and the **lineage graph**. + +Currently, Elementary integrates with **Looker**, **Tableau**, **Power BI**, and **Sigma**—with more to come! + +![](https://res.cloudinary.com/diuctyblm/image/upload/v1740079164/Changelog/image_43_hgu2zu.png) + +## Easier Test Creation + + + +We improved the test creation UI to make it easier to create tests. Given the fantastic response to the [dbt tests hub](https://www.elementary-data.com/dbt-test-hub) we introduced a few months ago, we decided to bring the test hub into the platform. You can now search directly from the Elementary UI, select the desired test, and create it directly from the UI using the YAML format provided with examples and explanations, then add owners, tags, severity, etc, and open a PR. + +Learn more about [test creation in Elementary](https://docs.elementary-data.com/features/data-tests/data-tests-overview). + +![](https://res.cloudinary.com/diuctyblm/image/upload/v1740079128/Changelog/New_Test_Creation_UI_new_eftsw3.gif) + +## New Integrations: Jira & Linear + + + +We’re excited to announce that you can now integrate Elementary with Jira or Linear to streamline your incident management process. + +With this integration, you can create new Jira or Linear tickets directly from Elementary. Once you connect your account, a **‘Create Ticket’** button will appear next to each incident in the incident management screen. + +Jira and Linear tickets created through Elementary will automatically include key details like the test name, description, query, results, and more—ensuring all relevant context is captured and shared. + +Learn more about connecting Elementary to [Jira](https://docs.elementary-data.com/cloud/integrations/alerts/jira) or [Linear](https://docs.elementary-data.com/cloud/integrations/alerts/linear) in our docs. + +![](https://res.cloudinary.com/diuctyblm/image/upload/v1740079070/Changelog/image_6_dubecv.png) + +![](https://res.cloudinary.com/diuctyblm/image/upload/v1740079094/Changelog/image_51_r2pbsr.png) + + +## Release v0.20.0 + + + +**What's Changed:** + +* Fix test dwh write dbt profiles +* Removed usage of deprecated `-m` flag in dbt +* Disable group registration when tracking is disabled +* Update dbt package revision +* Handle empty result in ClickHouse +* Handle invalid characters when uploading CI artifact +* Use row number instead of rank +* Add attribution block to alert messages in data monitoring +* dbt Fusion support +* CLI stop using deprecated tests +* Update the CLI to use the new package version +* Attempt to remove dbt-databricks restriction +* Update report version +* Feature: make the number of columns that are formatted as a table in Teams alerts a CLI flag so that users have more control over the formatting in Teams + +**Full Changelog**: [v0.19.5...v0.20.0](https://github.com/elementary-data/elementary/compare/v0.19.5...v0.20.0) + +## Release v0.19.5 + + + +**What's Changed:** + +* Slack join channel recursion fix +* Changed dbt package version to a commit with Dremio types mapping +* Enhance SlackWebMessagingIntegration to include a timeout feature +* Use sets for alert filters +* Dimension anomalies visualization + +**Full Changelog**: [v0.19.4...v0.19.5](https://github.com/elementary-data/elementary/compare/v0.19.4...v0.19.5) + +## Release v0.19.4 + + + +**What's Changed:** + +* Use full source name in freshness alerts +* Upgrade the elementary dbt package to v0.19.2 + +**Full Changelog**: [v0.19.3...v0.19.4](https://github.com/elementary-data/elementary/compare/v0.19.3...v0.19.4) + +## Release v0.19.3 + + + +**What's Changed:** + +* Fixed package version + +**Full Changelog**: [v0.19.2...v0.19.3](https://github.com/elementary-data/elementary/compare/v0.19.2...v0.19.3) + +## Release v0.19.2 + + + +**What's Changed:** + +* Fixed backwards compatibility issue with pydantic + +**Full Changelog**: [v0.19.1...v0.19.2](https://github.com/elementary-data/elementary/compare/v0.19.1...v0.19.2) + +## Release v0.19.1 + + + +**What's Changed:** + +* Added excludes option to edr monitor +* Text and markdown formats + +**Full Changelog**: [v0.19.0...v0.19.1](https://github.com/elementary-data/elementary/compare/v0.19.0...v0.19.1) + +## Release v0.19.0 + + + +**What's Changed:** + +* Enable support for multiple links and icons in alert messages +* Using elementary 0.19.0 + +**Known Issues:** + +* dbt-databricks must be below 1.10.2 (See issue 1931 for more details) + +**Full Changelog**: [v0.18.3...v0.19.0](https://github.com/elementary-data/elementary/compare/v0.18.3...v0.19.0) + +## Release v0.18.3 + + + +**What's Changed:** + +* Fixed missing metrics in CLI + +**Full Changelog**: [v0.18.2...v0.18.3](https://github.com/elementary-data/elementary/compare/v0.18.2...v0.18.3) + +## Release v0.18.2 + + + +**What's Changed:** + +* Subscribers in the grouped alerts +* ClickHouse CLI integration +* Add FileSystemMessagingIntegration and related tests +* Adds --s3-acl option to the CLI to be able to set S3 report permissions +* Fixed setup of internal dbt project used by Elementary +* Add function for `disable_elementary_logo_print` +* Update report to 1.0.26 + +**Full Changelog**: [v0.18.1...v0.18.2](https://github.com/elementary-data/elementary/compare/v0.18.1...v0.18.2) + +## Release v0.18.1 + + + +**What's Changed:** + +* Athena now works in the CLI +* Allow contributor PRs to run tests +* Add NOT_CONTAINS filter type + +**Full Changelog**: [v0.18.0...v0.18.1](https://github.com/elementary-data/elementary/compare/v0.18.0...v0.18.1) + +## Release v0.18.0 + + + +**What's Changed:** + +* Invocation Filter Fix: Invocation filters now apply to both reports and their summaries, ensuring consistent filtering +* Python Version Support: Official support for Python 3.8 has been discontinued to align with dbt's supported versions +* Test Description Bug: Fixed an issue where test descriptions were missing in alerts when using dbt version 1.9 + +Note: We've bumped the minor version to align with the recent minor version update in the dbt package. + +**Full Changelog**: [v0.17.0...v0.18.0](https://github.com/elementary-data/elementary/compare/v0.17.0...v0.18.0) + +## Release v0.17.0 + + + +_Release notes coming soon._ + +## Release v0.16.1 + + + +_Release notes coming soon._ + +## Release v0.16.0 + + + +_Release notes coming soon._ + +## Compact navbar + + + +You can now minimize the navigation bar for a better experience on smaller screens. + +![](https://res.cloudinary.com/diuctyblm/image/upload/v1740078816/Changelog/Compact_Nav_wcfmif.gif) + +## Sigma integration + + + +Elementary now connects with Sigma! + +With this integration, Elementary automatically extends data lineage down to the workbook page and element level. This means you’ll have full end-to-end visibility into your data, making it easier to understand downstream dependencies, also known as exposures. [Learn more about connecting to Sigma](https://docs.elementary-data.com/cloud/integrations/bi/sigma). + +![](https://res.cloudinary.com/diuctyblm/image/upload/v1740078782/Changelog/Screenshot_2024-12-09_at_16.31.03_etmoap.png) + +## Performance improvements + + + +In the past few months, we have made significant performance efforts to improve loading times within our platform. In particular, we created dedicated and efficient GraphQL API endpoints in our backend to speed up the dashboard, which now loads in a couple of seconds and usually less. + +We have also made considerable efforts to improve our database infrastructure to support additional scale and customers, which has resulted in an overall improvement throughout the platform. + +## Lineage Export + + + +Before making changes to a column or any other asset, you can assess the impact by exporting a textual summary of its lineage as a .csv file. The export includes: + +- Upstream and downstream assets +- Number of hops +- Names, owners, and tags of each dependency + +You can export the lineage for either a column or a table. This feature is accessible from both the lineage screen and the test overview dependencies tab. + +![](https://res.cloudinary.com/diuctyblm/image/upload/v1740078717/Changelog/Lineage_Export_mesth8.gif) + +## Enhanced Test Results + + + +The test overview side panel, previously only accessible from the incidents page, is now available on the test results page as well. This means you can view a full test overview even for tests that aren’t failing, not just for incidents. + +We’ve also introduced several updates to the panel: + +- View asset dependencies directly in a table format, with the option to export them as a .csv file or explore the lineage graph. +- Inspect the asset in the catalog with a single click. +- Disable automated freshness or volume tests directly from the test overview screen. + +![](https://res.cloudinary.com/diuctyblm/image/upload/v1740078639/Changelog/Test_Overview_naed5h.gif) + +## Custom Resource-based Roles + + + +Elementary now allows the creation of custom roles with access controls tailored to specific data. + +These roles are based on criteria such as environments, model path, dbt tags, or dbt owners. This is in addition to the existing access control (View, Edit, Admin). This ensures that users can only access the data they need, helping create focus and security. + +For now, our team will create the custom role for you. In the future, you will be able to do this on your own in the UI. To create a custom role reach out to us in the mutual Slack support channel, and we will create it for you. When the role is created, it will appear in the roles dropdown you see when inviting a new user to Elementary. + +## Bitbucket + + + +You can now connect Elementary to your Bitbucket code repository, where your dbt project code is managed. Once connected, Elementary will open PRs with configuration changes. + +https://docs.elementary-data.com/cloud/integrations/code-repo/bitbucket + +## Power BI + + + +You can now connect Power BI to Elementary! + +This will provide you with end-to-end data lineage to understand your downstream dependencies. +Once connected, Elementary will automatically and continuously extend the lineage to the report/dashboard level. + +https://docs.elementary-data.com/cloud/integrations/bi/power-bi + +## Elementary + Atlan: See your Elementary data quality insights in Atlan! + + + +We introduced a new test overview side panel that will make it much easier to investigate incidents. + +This overview is available directly from the Incidents Management screen and will soon be available from additional screens (test results, test coverage). + +The new view includes the complete test configuration and execution history, and for each result, it includes the result description, test query, and a row sample/anomaly chart (depending on the test type) + +This new side panel is also available for model build error incidents, allowing you to view dbt model build error messages right in the Elementary UI for the first time! + +![](https://res.cloudinary.com/diuctyblm/image/upload/v1740078239/Changelog/Screenshot_2024-09-29_at_17.17.16_mpw6ni.png) + +## A better way to triage test and model failures + + + +We introduced a new test overview side panel that will make it much easier to investigate incidents. + +This overview is available directly from the Incidents Management screen and will soon be available from additional screens (test results, test coverage). + +The new view includes the complete test configuration and execution history, and for each result, it includes the result description, test query, and a row sample/anomaly chart (depending on the test type) + +This new side panel is also available for model build error incidents, allowing you to view dbt model build error messages right in the Elementary UI for the first time! + +![](https://res.cloudinary.com/diuctyblm/image/upload/v1740078239/Changelog/Screenshot_2024-09-29_at_17.17.16_mpw6ni.png) + +## Introducing: Incident Management + + + +Managing alerts is a common challenge for our users. Daily test failures generate numerous alerts, making tracking each issue's status difficult. Alerts are just the starting point—users need a tool to manage the entire process. + +We’ve been working hard to solve these issues and are excited to introduce incidents in Elementary. + +**What are Incidents?** +An incident consists of one or more failure events. Each failure or warning opens a new incident or is added to an existing one. Different failures are grouped based on automated linking rules. Each incident has a start time, status, severity, assignee, and end time. Read more about [Incidents](https://docs.elementary-data.com/features/alerts-and-incidents/incidents). + +**New Incident Management Page** +Our new [Incident Management Page](https://docs.elementary-data.com/features/alerts-and-incidents/incident-management) helps your team stay on top of open incidents and collaborate to resolve them. It provides a comprehensive overview of all incidents, allowing users to view statuses, prioritize issues, assign responsibilities, and resolve incidents efficiently. + + + +## New Alert Integrations! + + + +Introducing three new integrations with communication and incident management tools: - [Microsoft Teams](https://docs.elementary-data.com/cloud/integrations/alerts/ms-teams) - [PagerDuty](https://docs.elementary-data.com/cloud/integrations/alerts/pagerduty) - [Opsgenie](https://docs.elementary-data.com/cloud/integrations/alerts/opsgenie) + +## Add any dbt test via UI + + + +Elementary now supports adding any dbt test in bulk and directly from the UI! +We added support for dbt-expectation and dbt-utils packages, and you can use it to add your own custom generic tests to tables and columns. + + + +## Column anomalies by dimensions + + + +You can now add a new parameter to your column anomaly tests - `dimensions`. + +This will calculate the column metrics for every time bucket and table dimension values, allowing you to detect anomalies in specific segments of your data. + +For example, if you want to detect anomalies in a revenue column and you have multiple apps in different countries - now you can detect anomalies in revenue in a specific country. + +Here is an example of how this can be configured - + +```yaml + columns: + - name: in_app_purchase_revenue + tests: + - elementary.column_anomalies: + column_anomalies: + - sum + dimensions: + - app_name + - country +``` + +## Column level test results in lineage + + + +You can now investigate test results right on top of your column level lineage graph. +With this new release you can filter the lineage graph on a specific column that has an issue, and see if upstream or downstream columns +have similar test failures to understand the root cause fast. + + + +## Monitor test durations + + -
-
- -
- -### Test Execution History is now showing test durations - -
-
-
- Test execution history -
-
We added test durations to the Test Execution History (the command and control center for your tests). You can now monitor your tests performance and see which tests are taking the longest or if there was any degradation in performance of specific tests. Easily sort your tests by execution duration and see which ones are taking the longest and choose the most promising candidates for optimization. This can also be used for cost analysis as slow tests tend to be more expensive. @@ -193,257 +599,93 @@ description: "See what's new at Elementary" src="https://res.cloudinary.com/diuctyblm/video/upload/v1719320330/Changelog/Test_duration_waxg1j.mp4" > -
-
- -
- -### Sync now - -
-
-
- Environments -
-
- We are excited to launch a new `Sync Now` button right in your environments page. You can think of is as a "refresh now" button for your environment. - If you introduced a change in your environment and you want to see it in Elementary immediately - just click ‘sync now’. - - Here is what it looks like - +## Sync now - + + +We are excited to launch a new `Sync Now` button right in your environments page. You can think of is as a "refresh now" button for your environment. +If you introduced a change in your environment and you want to see it in Elementary immediately - just click ‘sync now’. + +Here is what it looks like - + + + +## Status and assignee for alerts + + + +An alert is just the start of a triage and response process. We have big plans for making this process much more effective. +The first step was alert rules, and now we are introducing status and assignee selection in our alerts. + +This should help you manage incidents and collaborate more effectively with your team. + + + + + +## Custom SQL tests in UI + + + +Custom query testing is a must-have to validate business logic specific to your data team. You can now create custom SQL tests with Elementary, which will be translated into singular tests in your dbt project. + + + + + +## Model runs in dashboard + + + +The results of your jobs are critical to the data health. These are now included in the Elementary dashboard, for you to get a complete overview of the status of your pipelines. We added a breakdown of the latest result of each model, and the aggregated failures over time. + + + + + +## DAG subset in Lineage + + + +To improve UX and performance, we changed the default behavior in the lineage. +When a node or group of nodes is chosen, only the first hop in both directions is loaded. To see the rest of the hops you can use the +/- buttons (see video). + + + + + +## Role-based access control + + + +Elementary now allows creating users with different roles. +This will allow you to add more users from your team to Elementary, allowing them to view results without giving them the ability to change environment settings. -
-
- -
- -## April 2024 - -### Status and assignee for alerts 🔔 - -
-
-
- Alerts -
-
- An alert is just the start of a triage and response process. We have big plans for making this process much more effective. The first step was alert rules, and now we are introducing status and assignee selection in our alerts. - - This should help you manage incidents and collaborate more effectively with your team. - - *Note: at this time the status and assignee are not reflected in the Elementary UI, but this is expected to change in the near future ​:smile:* - - - ![Status and assignee for alerts](pics/changelog/status-and-assignee-for-alerts.png) - - -
- -
-
- -### Custom SQL tests in UI 🧪 - -
-
-
- Test configuration -
-
- Custom query testing is a must-have to validate business logic specific to your data team. You can now create custom SQL tests with Elementary, which will be translated into singular tests in your dbt project. - - - ![Custom SQL tests in UI](pics/changelog/custom-sql-tests-in-ui.gif) - - -
-
- -
- -### Model runs in dashboard 📈 - -
-
-
- Dashboard -
-
- The results of your jobs are critical to the data health. These are now included in the Elementary dashboard, for you to get a complete overview of the status of your pipelines. We added a breakdown of the latest result of each model, and the aggregated failures over time. - - - ![Model runs in dashboard](pics/changelog/model-runs-in-dashboard.gif) - - -
-
- -
- -### Lineage improvements - Load DAG subset ⬆️ - -
-
-
- Lineage -
-
- To improve UX and performance, we changed the default behavior in the lineage. When a node or group of nodes is chosen, only the first hop in both directions is loaded. To see the rest of the hops you can use the +/- buttons (see video). - - - ![Load DAG subset](pics/changelog/lineage-load-dag-subset.gif) - - -
-
- -
- -### Role-based access control 🚦 - -
-
-
- Team -
-
- Elementary now allows creating users with different roles. This will allow you to add more users from your team to Elementary, allowing them to view results without giving them the ability to change environment settings. - - *Note: All the existing users now have "Admin" roles but this can be modified using the team management screen.* - - - ![Load DAG subset](pics/changelog/role-based-access-control.gif) - - -
-
- -
+ + + diff --git a/docs/cloud/ai-agents/catalog-agent.mdx b/docs/cloud/ai-agents/catalog-agent.mdx new file mode 100644 index 000000000..98d1275c8 --- /dev/null +++ b/docs/cloud/ai-agents/catalog-agent.mdx @@ -0,0 +1,54 @@ +--- +title: "Catalog AI Agent" +sidebarTitle: "Catalog agent" +icon: "book-open" +--- + + + +**The Catalog Agent helps data analysts, business users, and AI tools find the right data on their own so engineers don't have to be the bottleneck.** By interpreting natural language questions and surfacing the right context—including data health, lineage, and metric definitions—it enables faster, more confident access to the data people need. The Catalog Agent lays the groundwork for more accurate and effective AI-powered workflows across your organization. + +### Using the Catalog Agent + +Open the Catalog Agent to start a conversation. Just ask a question like **“Which table should I use to analyze user retention?”** or **“Can I trust the orders table?”** and the agent will guide you to the right assets or insights. +The agent will interpret your request using metadata, lineage, and semantic context, and suggest relevant datasets, tables, or metrics. It will also surface freshness, incident status, and test results, and explain how the data is used and whether it can be trusted. + + +
+ +
+ + +### Key Capabilities + +### Intelligent Discovery + +Helps users find the right data using metadata (tags, descriptions, owners, lineage to understand dependencies and impact and semantic layers from dbt, BI tools, or external catalogs for metric-level context + +### Trust and Quality Signals + +For every asset, the agent provides real-time indicators of reliability: + +- Freshness metrics +- Open incidents and recent test results +- A high-level trust signal based on metadata coverage and data health + +### Who It’s For + +The Catalog Agent is built for **data analysts**, **business users**, **AI copilots**, and **intelligent applications** that need to find and understand the right data without making the data team a bottleneck. + +We take data privacy seriously. Read our [**AI Privacy Policy**](/cloud/general/ai-privacy-policy) to understand how AI features are secured. + diff --git a/docs/cloud/ai-agents/governance-agent.mdx b/docs/cloud/ai-agents/governance-agent.mdx new file mode 100644 index 000000000..087d06fbd --- /dev/null +++ b/docs/cloud/ai-agents/governance-agent.mdx @@ -0,0 +1,83 @@ +--- +title: "Governance AI Agent" +sidebarTitle: "Governance agent" +icon: "tags" +--- + + + + +**Governing your data isn't just about internal organization, it's the key to making your data reliably usable by both humans and AI.** + +The Governance Agent helps you scale and maintain trusted, well-documented data assets without manual effort. It evaluates your metadata, identifies gaps, and suggests improvements, all based on how your organization already tags, documents, and assigns ownership. + +By learning from your environment’s existing conventions, the agent brings consistency, policy enforcement, and AI-readiness across your pipelines. + +### **Using the Governance Agent** + +You can trigger the Governance Agent from the Catalog screen, and ask it a question like **"Which tables in the marketing domain are missing ownership?"**. It will analyze the asset’s metadata and context, validate it against both best practices and your custom policies, and generate a list of actionable suggestions. When you’re ready, the agent can open a pull request with the proposed metadata changes. + + +
+ +
+ + +### Key Capabilities + +### Learns from Your Environment + +The agent understands and scales your internal standards by analyzing: + +- Tag structures and naming conventions +- Ownership assignments +- Folder hierarchies and asset types +- Custom metadata fields already in use + +This ensures all recommendations align with your existing governance style — not just a static set of rules. + +### Validates Governance Coverage + +The agent identifies: + +- Missing or inconsistent documentation for tables, columns, and BI assets +- Unassigned or conflicting owners +- Gaps in classification, tags, or custom metadata +- Sensitive data fields that aren’t properly flagged or controlled + +### Metadata Enrichment + +The agent suggests or autofills: + +- Table and column descriptions +- Tags and classifications +- Ownership fields +- Custom metadata based on known structures + +### Sensitive Data Enforcement + +Sensitive fields are automatically detected and checked to ensure appropriate tagging is applied, and that there's no propagation to unauthorized downstream assets + +### Pull Request Automation + +Once suggested changes are ready, the agent will open a **pull request** for you to review and apply. + +### Who It’s For + +Built for **data platform teams, analytics engineers, and governance leads** who want to increase metadata coverage, enforce governance policies, and ensure their data is ready for AI, without adding manual overhead. + + +We take data privacy seriously. Read our [**AI Privacy Policy**](/cloud/general/ai-privacy-policy) to understand how AI features are secured. diff --git a/docs/cloud/ai-agents/overview.mdx b/docs/cloud/ai-agents/overview.mdx new file mode 100644 index 000000000..21876550a --- /dev/null +++ b/docs/cloud/ai-agents/overview.mdx @@ -0,0 +1,31 @@ +--- +title: "Ella: Elementary's AI Agents" +sidebarTitle: "Ella: AI agents" +icon: "sparkles" +--- + +**Ella is the intelligence layer that eliminates manual workflows and accelerates how data teams work.** + +She detects coverage gaps, recommends and generates tests, traces root causes, fixes issues, optimizes queries, and surfaces missing metadata—making your data AI-ready and compliant by default. + +Ella runs on self-hosted models through Amazon Bedrock. No sensitive data is stored or shared, and all features are opt-in by default. Every action flows through a pull request, keeping your team fully in control. + +Her behavior is guided by clear, reviewable AI policies that align with your governance standards. Ella’s agents handle the busywork so your team can stay focused on what moves the business forward. + + + + + + + + + + + + + + + + + +Want to learn more? [**Our experts will be happy to chat.**](https://meetings-eu1.hubspot.com/joost-boonzajer-flaes/intro-call-docs?uuid=64e4c20d-0ff3-4454-9387-c93b47afcee6) diff --git a/docs/cloud/ai-agents/performance-cost-agent.mdx b/docs/cloud/ai-agents/performance-cost-agent.mdx new file mode 100644 index 000000000..ef6874fee --- /dev/null +++ b/docs/cloud/ai-agents/performance-cost-agent.mdx @@ -0,0 +1,73 @@ +--- +title: "Performance & Cost AI Agent" +sidebarTitle: "Performance & cost agent" +icon: "comments-dollar" +--- + + + + +**Optimize long-running queries, eliminate bottlenecks, and keep compute costs under control without digging through logs or rewriting SQL by hand.** The Performance and Cost Agent helps you identify inefficiencies in your pipelines and resolve them quickly, so your data stays fresh, fast, and within budget. + +It improves performance by analyzing query logic, historical execution patterns, and resource usage. This makes it easier to meet data SLAs and ensure products, dashboards, and AI models get the data they need, when they need it. + +### Using the Performance and Cost Agent +Trigger the agent from any asset flagged as underperforming or from a query known to be slow or expensive. It reviews the SQL, surrounding metadata, and historical performance, then recommends code changes and resource configuration tweaks to improve efficiency. + +You’ll see an estimate of time and cost savings before applying changes. Once approved, the agent opens a pull request with its suggested improvements and can continue iterating until performance targets are met. + + +
+ +
+ + +### Key Capabilities + +### Performance Diagnostics and Optimization + +The agent evaluates: + +- Query logic and structure +- Historical execution time and resource usage +- Metadata context and lineage + +It uses that analysis to identify performance bottlenecks and propose: + +- Code-level SQL optimizations +- Resource allocation and configuration changes +- Structural improvements that reduce load or complexity + +### Cost and SLA Awareness + +Each recommendation includes an estimate of: + +- Time saved per run +- Cost reduction per run or per day +- Impact on data freshness and downstream data availability + +This makes it easier to prioritize which performance issues are worth fixing and why. + +### Automated Implementation and Iteration + +Once recommendations are approved, the agent opens a pull request with the proposed changes. It can continue to iterate based on updated performance feedback until the asset meets defined SLAs or cost targets. + +### Who It’s For + +Built for **analytics engineers and data engineers** who need to improve pipeline speed, reduce costs, and ensure data products stay fresh and reliable without getting buried in manual query tuning. + +We take data privacy seriously. Read our [**AI Privacy Policy**](/cloud/general/ai-privacy-policy) to understand how AI features are secured. + diff --git a/docs/cloud/ai-agents/test-recommendation-agent.mdx b/docs/cloud/ai-agents/test-recommendation-agent.mdx new file mode 100644 index 000000000..de58322c5 --- /dev/null +++ b/docs/cloud/ai-agents/test-recommendation-agent.mdx @@ -0,0 +1,55 @@ +--- +title: "Test Recommendation AI Agent" +sidebarTitle: "Test recommendation agent" +icon: "vial-circle-check" +--- + + + + +**Make sure you’re covered—not by how many tests you write, but by how many problems you catch.** The Test Recommendation Agent analyzes your data asset, metadata, lineage, and existing test suite to suggest high-impact tests that match your team’s style and priorities. + +It helps you expand meaningful coverage, reduce alert fatigue, and maintain a reliable test strategy as your pipelines evolve. + +### Using the Test Recommendation Agent +Open the agent from the test configuration screen and ask a question like **"What tests can ensure all of my customer data is up to date?"** or **"Which upstream tables need tests to protect my revenue dashboard?"**. The agent generates recommendations based on asset type, criticality, lineage, and your existing test configurations. You can approve, adjust, or skip suggestions as needed. Once approved, the agent opens a pull request with changes that follow your team’s current test structure. + + + +
+ +
+ + +### Key Capabilities + +### Coverage Analysis and Test Recommendations + +The agent analyzes assets individually and in context, taking into account lineage, metadata, SQL logic, and usage, to identify where coverage is missing or misaligned. It suggests new tests based on best practices and your team's existing style, and your own test policy configuration. + +### Fatigue Reduction + +The agent flags tests that are overly noisy, ineffective, or no longer relevant, and helps tune configurations to make your test suite more reliable and actionable. + +### Automated Implementation + +Once you're satisfied with the recommendations, the agent opens a pull request with all changes. You stay in control, but no longer have to do everything manually. + +### Who It’s For + +Built for **analytics engineers and data engineers** who want to improve test coverage intelligently, avoid alert fatigue, and maintain high data quality without slowing down development. + +We take data privacy seriously. Read our [**AI Privacy Policy**](/cloud/general/ai-privacy-policy) to understand how AI features are secured. diff --git a/docs/cloud/ai-agents/triage-resolution-agent.mdx b/docs/cloud/ai-agents/triage-resolution-agent.mdx new file mode 100644 index 000000000..d9620fca4 --- /dev/null +++ b/docs/cloud/ai-agents/triage-resolution-agent.mdx @@ -0,0 +1,74 @@ +--- +title: "Triage & Resolution AI Agent" +sidebarTitle: "Triage & resolution agent" +icon: "toolbox" +--- + + + + +**Automates root cause analysis, clarifies the impact, and suggests actionable fixes so you spend less time triaging and more time solving what matters.** The result is a shift from manual investigation to a faster, more focused workflow, allowing you to focus on delivering business value rather than fighting fires. + +Whether it's a failed test, a model issue, or an unexpected anomaly, it inspects recent code changes, execution history, lineage, and, when allowed, the data itself. It prioritizes the incidents that matter, proposes a fix, and can open a pull request for you. That way, incidents stop slowing you down, and pipelines begin to fix themselves. + +## Using the Triage & Resolution agent + +From the Incident Management screen click **“Investigate with AI”** to trigger the agent. It will begin analyzing the incident and surface insights. You can follow along, ask clarifying questions, and when you're ready, allow the agent to open a pull request with its suggested fix - all within your review flow. + + +
+ +
+ + +## Key Capabilities + +### Automated Root Cause Investigation + +The agent dives deep into each incident to uncover why it happened. It inspects: + +- **Recent code changes**, including relevant commits and pull requests. +- **Historical executions** of the pipeline or asset. +- **Upstream lineage**, to detect whether the issue originated from a dependency. +- **The data itself** (when allowed), including the affected asset and its related tables. + +### Impact Clarification + +It doesn’t stop at what’s broken, it tells you **what’s affected**: + +- Analyzes **downstream dependencies** to understand the blast radius. +- Validates whether **data products, dashboards, metrics**, or **AI applications** are impacted. + +### Incident Prioritization + +Each issue is automatically ranked by the downstream dependencies impact, and the criticality of the affected asset. + +This helps your team know what to focus on, and what can wait. + +### Automated Remediation (Where Possible) + +Once the problem is identified, the agent can: + +- Recommend **concrete steps to fix** the issue. +- Automatically **open a pull request** with proposed changes. +- **Notify stakeholders** affected by the issue. + +## Who It’s For + +Designed for **hands-on data professionals**, including data engineers, analytics engineers, and platform teams, who need a faster, more reliable way to stay on top of incidents without getting buried in triage work. + +We take data privacy seriously. Read our [**AI Privacy Policy**](/cloud/general/ai-privacy-policy) to understand how AI features are secured. + diff --git a/docs/cloud/best-practices/detection-and-coverage.mdx b/docs/cloud/best-practices/detection-and-coverage.mdx new file mode 100644 index 000000000..48279cfd8 --- /dev/null +++ b/docs/cloud/best-practices/detection-and-coverage.mdx @@ -0,0 +1,150 @@ +--- +title: "Detection and coverage" +--- + +In Elementary you can detect data issues by combining data validations (as dbt tests, custom SQL) and anomaly detection monitors. + +As you expand your coverage, it's crucial to balance between coverage and meaningful detections. While it may seem attractive to implement extensive monitoring throughout your data infrastructure, this approach is often suboptimal. Excessive failures can lead to alerts fatigue, potentially causing them to overlook significant issues. Additionally, such approach will incur unnecessary compute costs. + +In this section we will cover the available tests in Elementary, recommended tests for common use cases, and how to use the data quality dimensions framework to improve coverage. + +## Supported data tests and monitors + +Elementary detection includes: + +- Data tests - Validate an explicit expectation, and fail if it is not met. + - Example: validate there are no null values in a column. +- Anomaly detection monitors - Track a data quality metric over time, and fail if there is an anomaly comparing to previous values and trend. + - Example: track the rate of null values in a column over time, fail if there is a spike. + +### Data tests + +- dbt tests - Built in dbt tests (`not_null`, `unique`, `accepted_values`, `relationship` ) +- dbt packages - Any dbt package test, we recommend installing `dbt-utils` and `dbt-expectations` . +- Custom SQL tests - Custom query, will pass if no results and fail if any results are returned. + +### Anomaly detection monitors + +Elementary offers two types of anomaly detection monitors: + +- **Automated Monitors** - Out-of-the-box volume and freshness monitors activated automatically, that query metadata only. +- **Opt-in anomaly detection tests** - Monitors that query raw data and require configuration. + + +### Recommendations + +- Deploy the packages dbt-utils and dbt-expectations in your dbt projects, to enrich your available tests +- Refer to the [dbt test hub](https://www.elementary-data.com/dbt-test-hub) by Elementary, to explore available tests by use case + + +## Fine-tuning automated monitors + +As soon as you connect Elementary Cloud Platform to your data warehouse, a backfill process will begin to collect historical metadata. Within an average of a few hours, your automated monitors will be operational. By default, Elementary collects at least 21 days of historical metadata. + +You can fine tune the [**configuration**](https://docs.elementary-data.com/features/anomaly-detection/monitors-configuration) and [**provide feedback**](https://docs.elementary-data.com/features/anomaly-detection/monitors-feedback) to adjust the detection to your needs. + +You can read here about how to interpret the result, and what are the available setting of each monitor: + +- [Automated Freshness](https://docs.elementary-data.com/features/anomaly-detection/automated-freshness) +- [Automated Volume](https://docs.elementary-data.com/features/anomaly-detection/automated-volume) + +## Common testing use cases + +We have the following recommendations for testing different data assets: + +### Data sources + +To detect issues in sources updates, you should monitor volume, freshness and schema: + +- Volume and freshness + - Data updates - Elementary cloud provides automated monitors for freshness and volume. **These are metadata monitors.** + - Updates freshness vs. data freshness - The automated freshness will detect delays in **updates**. \*\*\*\*However, sometimes the update will be on time, but the data itself will be outdated. + - Data freshness (advanced) - Sometimes a table can update on time, but the data itself will be outdated. If you want to validate the freshness of the raw data by relaying on the actual timestamp, you can use: + - Explicit threshold [freshness dbt tests](https://www.elementary-data.com/dbt-test-hub) such as `dbt_utils.recency` , or [dbt source freshness](https://docs.getdbt.com/docs/deploy/source-freshness). + - Elementary `event_freshness_anomalies` to detect anomalies. + - Data volume (advanced) - Although a table can be updated as expected, the data itself might still be imbalanced in terms of volume per specific segment. There are several tests available to monitor that: + - Explicit [volume expectations](https://www.elementary-data.com/dbt-test-hub) such as `expect_table_row_count_to_be_between`. + - Elementary `dimension_anomalies` , that will count rows grouped by a column or combination of columns and can detect drops or spikes in volume in specific subsets of the data. +- Schema changes + + - Automated schema monitors are coming soon: + - These monitors will detect breaking changes to the schema only for columns being consumed based on lineage. + - For now, we recommend defining schema tests on the sources consumed by downstream staging models. + + Some validations on the data itself should be added in the source tables, to test early in the pipeline and detect when data is arriving with an issue from the source. + + - Low cardinality columns / strict set of values - If there are fields with a specific set of values you expect use `accepted_values`. If you also expect a consistency in ratio of these values, use `dimension_anomalies` and group by this column. + - Business requirements - If you are aware of expectations specific to your business, try to enforce early to detect when issues are at the source. Some examples: `expect_column_values_to_be_between`, `expect_column_values_to_be_increasing`, `expect-column-values-to-have-consistent-casing` + + +### Recommendations + +- Add data freshness and volume validations for relevant source tables, on top of the automated monitors (advanced) +- Add schema tests for source tables + + +### Primary / foreign key columns in your transformation models + +Tables should be covered with: + +- Unique checks on primary / foreign key columns to detect unnecessary duplications during data transformations. +- Not null checks on primary / foreign key columns to detect missing values during data transformations. + +For incremental tables, it’s recommended to use a `where` clause in the tests, and only validate recent data. This will prevent running the tests on large data sets which is costly and slow. + + +#### Recommendations + +- Add `unique` and `not_null` tests to key columns + + +### Public tables + +As these are your data products, coverage here is highly important. + +- Consistency with sources (based on aggregation/primary keys) +- Volume and freshness +- Unique and not null checks on primary keys +- Schema to ensure the "API" to data consumers is not broken +- Business Metrics / KPIs + - Sum / max anomalies group by your critical dimensions / segments (For example - country, platform…) + +### Data quality dimensions framework + +To ensure your detection and coverage have a solid baseline, we recommend leveraging the quality dimensions framework for your critical and public assets. + +The quality dimensions framework divides data validation into six common dimensions: + +- **Completeness**: No missing values, empty values, nulls, etc. +- **Uniqueness**: The data is unique, with no duplicates. +- **Freshness**: The data is up to date and within the expected SLAs. +- **Validity**: The data is in the correct format and structure. +- **Accuracy**: The data adheres to our business requirements and constraints. +- **Consistency**: The data is consistent from sources to targets, and from sources to where it is consumed. + +Elementary has already categorized all the existing tests in the dbt ecosystem, including all elementary anomaly detection monitors, into these quality dimensions and provides health scores per dimension automatically. It also shows if there are coverage gaps per dimension. + +We highly recommend going to the relevant quality dimension, then filtering by a business domain tag to see your coverage gaps in that domain. + +Example - + +![Data health dashboard](https://res.cloudinary.com/diuctyblm/image/upload/v1738149955/Docs/data-health-dashboard_czfhhp.webp) + +In this example, you can see that accuracy tests are missing for our sales domain. This means we don't know if the data in our public-facing "sales" tables adheres to our business constraints. For example, if we have an e-commerce shop where no product has a price below $100 or above $1000, we can easily add a test to validate this. Implementing validations for the main constraints in this domain will allow us to get a quality score for the accuracy level of our data. + +NOTE: The `Test Coverage` page in Elementary allows adding any dbt test from the ecosystem, Elementary anomaly detection monitors, and custom SQL tests. We are working on making it easier to add tests by creating a test catalog organized by quality dimensions and common use cases. + +Example for tests in each quality dimension - + +- **Completeness**: + - not_null, null count, null percent, missing values, empty values, column anomalies on null count, null percent, etc +- **Uniqueness**: + - unique, expect_column_values_to_be_unique, expect_column_unique_value_count_to_be_between, expect_compound_columns_to_be_unique +- **Freshness**: The data is up to date and within the expected SLAs. + - Elementary automated freshness monitor, dbt source freshness, dbt_utils.recency, expect_grouped_row_values_to_have_recent_data +- **Validity**: The data is in the correct format and structure. + - expect_column_values_to_match_regex, expect_column_min_to_be_between, expect_column_max_to_be_between, expect_column_value_lengths_to_be_between, column anomalies on min, max, string lengths +- **Accuracy**: The data adheres to our business requirements and constraints. + - expression_is_true, custom SQL +- **Consistency**: The data is consistent from sources to targets, and from sources to where it is consumed. + - relationship, expect_table_row_count_to_equal_other_table, expect_table_aggregation_to_equal_other_table \ No newline at end of file diff --git a/docs/cloud/best-practices/governance-for-observability.mdx b/docs/cloud/best-practices/governance-for-observability.mdx new file mode 100644 index 000000000..66ba2d95b --- /dev/null +++ b/docs/cloud/best-practices/governance-for-observability.mdx @@ -0,0 +1,136 @@ +--- +title: "Governance for observability" +--- + +For an effective data observability process, it’s recommended to establish clear ownership, priorities and segmentation of data assets. This structure enhances governance, speeds up issue resolution, and improves data health tracking. + +Segmenting assets organizes data into manageable units, making monitoring and triage easier. Ownership ensures accountability, with specific individuals responsible for quality and response to incidents. + +## Introduction to tags, owners and subscribers + +### Tags + +As your data platform evolves and more people are maintaining it, structure and context become significantly more important. Tags are a great tool to create that context, and segment your data assets by business domains, data products, priority, etc. + +In Elementary tags are automatically included in alerts, and you can create rules to distribute alerts to different channels by tag. Additionally, different views in the platform can be filtered by tag, and provide a view for a subset of your data assets. + +- Tags for tables can be added in code at the model or folder level, and the `tags` key. +- It’s recommended to leverage dbt directories hierarchy to set tags to entire directories (in the dbt_project.yml). Tags are aggregated, so if a specific model under the directory has a different tag, the model will have both tags. + +```yaml +models: + analytics: + marketing: + +tags: marketing + public: + +tags: marketing-public +``` + +- Tags for tests can be added in code or in the Elementary UI when adding a test. + +### Owners and subscribers + +The best method to reduce time to response when there is a data issue is having a clear owner that is in charge of initial triage and accountable for the asset health. In Elementary owners are automatically tagged in alerts. Additionally, different views in the platform can be filtered by owner. + +A data asset or test should have only one owner, but other people might want to be notified on issues. These people can be listed as subscribers, and will be automatically tagged in alerts. + +- If you use a valid Slack / MS teams user as owner / subscriber, they will be tagged in alerts. +- The owner of an asset should be the person / team that is expected to respond to an issue in that asset. +- If there are specific tests or monitors that are relevant to other people, they can be the owners of these tests. + For example: A data engineer is the owner of a model and will be notified on freshness, volume, and data validations issues. A data analyst added some custom SQL tests to validate business logic on this model, and he owns these tests. +- It’s recommended to leverage dbt directories hierarchy to set owners to entire directories (in the dbt_project.yml). Owners are unique, so an owner that is defined on a model overrides the directory configuration. (Subscribers are aggregated). + +```yaml +models: + - name: return_on_ad_spend + config: + tags: + - marketing-public + - marketing + meta: + owner: :"@analytics.engineer" + subscribers: + - "@marketing.data.analyst" + - "@another.marketing.data.analyst" +``` + +## Business domains & Data products + +- We recommend configuring the following tags for models: + - **Business domains** - These tags should be useful to understand what is the business context of the asset, and for stakeholders to filter and view the status of assets relevant to their business unit. Relevant examples are tags such as: `product-analytics` , `marketing` , `finance` , etc. + - **Data products** - Public tables that are exposed as “data products” to data consumers. These are the most important tables within a specific domain, similar to an API for an application. Public tables are usually the interface and focal point between analytics engineers and data analysts. It's crucial for both to be aware of any data issues in these tables. Relevant examples are tags such as: `product-analytics-public` , `marketing-public` , `data-science-public` , etc. + - Another possible implementation is using 3 types of tags - + - `marketing-internal` for all internal transformations on marketing data. + - `marketing-public` for all public-facing marketing data. + - `marketing` for all marketing-related data assets. +- **Owners and subscribers -** + + - Make sure to have clear ownership defined for all your public-facing tables. We also recommend adding subscribers to the relevant public tables. + - Usually, the owners of these public tables are the analytics engineering team, and the subscribers are the relevant data analysts who rely on the data from these tables. + + +### Recommendations + +- Add business domain tags to public tables +- Define owners for public facing tables +- Add data consumers as subscribers to relevant public facing tables + + + + +## Priorities (optional) + +Another useful tagging convention can be to set a tag that filters a subset of assets by their priority, so you could establish a process of response to issues with higher criticality. + +Decide how many levels of priority you wish to maintain, and implement by adding a `critical` tag to your critical assets, or create a `P0`, `P1` , `P2` tags for several priority levels. + +This will enable you to filter the results in Elementary by priority, and establish workflows such as sending `critical` alerts to Pagerduty, and the rest to Slack. + + +### Recommendations + +- Add priorities / critical tags to tables / tests (Optional) +- Add owners to all top priority tables / tests (Optional) + + +## Data sources + +Many data issues are a result of a problem in the source data, so effectively monitoring source tables is significant to your pipeline health. + +Use tags to segment your source tables: + +- If multiple source tables are loaded from the same source, we recommend grouping them by tags, such as: `mongo-db-replica`, `salesforce`, `prod-postgres`, etc. +- To make triage easier, you can also add tags of the ingestion system, such as: `fivetran`, `airflow` , `airbyte` , `kafka` , etc. + +Ownership and subscribers: + +- Usually, sources are managed by data engineers and analytics engineers are their consumers. One common way to manage this is to set data engineers as the owners and analytics engineering team members as the subscribers. + +```yaml +sources: + - name: fivetran_salesforce_sync + config: + tags: + - fivetran + - salesforce + meta: + owner: :"@data.engineer" + subscribers: "@analytics.engineer" +``` + + +### Recommendations + +- Add tags to source tables that describe the source system and / or ingestion method +- Add owners and subscribers to source tables + + +## Recommendations + +- Add business domain tags to public tables +- Define owners for public facing tables +- Add data consumers as subscribers to relevant public facing tables +- (Optional) Add priorities / critical tags to tables / tests +- (Optional) Add owners to all top priority tables / tests +- Add tags to source tables that describe the source system and / or ingestion method +- Add owners and subscribers to source tables \ No newline at end of file diff --git a/docs/cloud/best-practices/introduction.mdx b/docs/cloud/best-practices/introduction.mdx new file mode 100644 index 000000000..02e3ccee1 --- /dev/null +++ b/docs/cloud/best-practices/introduction.mdx @@ -0,0 +1,31 @@ +--- +title: "Elementary Best Practices" +sidebarTitle: "Introduction" +--- + +The goal of this collection of guides is to help you effectively implement and use Elementary. We'll cover best practices and provide practical tips to enhance your governance, detection, +coverage, response and collaboration. + +Whether you're new to Elementary or looking to optimize your current usage, these guides will help you leverage its full potential to improve your data +reliability. + + + + + + \ No newline at end of file diff --git a/docs/cloud/best-practices/triage-and-response.mdx b/docs/cloud/best-practices/triage-and-response.mdx new file mode 100644 index 000000000..c80d3b267 --- /dev/null +++ b/docs/cloud/best-practices/triage-and-response.mdx @@ -0,0 +1,195 @@ +--- +title: "Triage & response" +--- + +Maintaining high data quality is much more than adding tests - It’s about creating processes. + +The processes that will improve your data quality, reduce response times, and prevent repeating incidents have to do with: + +- Clear ownership and response plan +- Incident management +- Effective triage and resolution +- Ending incidents with improvements, not just resolution + +Elementary has tools in place to help, and this guide is meant to help get as much value as possible from Elementary in the process of handling data incidents. + +## Plan the response in advance + +Your response to a data incident doesn’t actually start when the failure happens. An effective response starts when you add a test / monitor / dataset. + +For every test or monitor you add, think about the following - + +- Who should look into a failure? +- Who should be notified of the failure? +- What is the potential impact and severity of a failure? +- What information should the notification include? +- How to go about resolving the issue? What are the steps? + +According to these answers, you should add configuration that will impact the alert, alert distribution, and triage: + + + ### Recommendations + + - Add a test description that details what it means if this test fails, and context on resolving it. Descriptions can be added in UI or in code. + - Each failure should have an owner, that should look into the failure. It can be the owner of the data set or an owner of a specific test. + - If others need to be notified, add subscribers. + - Use the [severity of failures](https://docs.getdbt.com/reference/resource-configs/severity) intentionally, and even leverage conditional expressions (`error_if`, `warn_if`) + - Test failures and alerts include a sample of the failed results, and the test query. You can change the test query and / or add comments to it, that can provide triage context. + + +```yaml +data_tests: + - unique: + config: + error_if: ">10" + meta: + description: "More than 10 duplicate records arriving from the source, as this is a staging table" + owner: "@data.provider" + tags: "critical", "mongo-db", "raw-production-replica" +``` + +## Alert distribution + +As far as alerts are concerned, the desired situation is that team members will only get alerts they need to do something about - Fix the issue, wait for resolution to refresh a dashboard, etc. +Alert distribution can be configured in the [Alert rules](https://docs.elementary-data.com/features/alerts-and-incidents/alert-rules) feature. +The alerts can be distributed to different channels (within Slack / MS Teams) and to different tools (Pagerduty, Ops Genie, etc). +Elementary users usually distribute alerts by: + +1. Business domain tags - In teams where each domain has their own data teams, it’s recommended to have a separate Slack channel for alerts on that domain’s models. The domain alert rules are usually defined by tags. +2. Responsible team - For example, if there is a problem with null values in a Salesforce source, it makes sense to send the alert straight to the Salesforce team. These alert rules can be defined by model / source name, tag or owner. +3. Criticality - The most critical alerts are usually model error alerts, and handling them is critical because it blocks the pipeline. Since those issues are sometimes time sensitive, some teams choose to send them to Pager Duty or Ops Genie, or at least a dedicated Slack channel with different notification settings. +4. Low priority alerts / warnings - We generally recommend refraining from sending Slack alerts for failures that don’t have a clear response plan yet. These failures can not be sent at all, or sent to a muted channel that will operate as a “feed”. + Such failures can be: 1. Newley configured anomaly detection tests or explicit tests where you have low certainty about the threshold / expectation. 2. Anomaly detection tests that you consider as a safety measure, not a clear failure. + This is not to say that they are not interesting - but, they can be investigated within the Elementary UI, using the incidents page, at a time of convenience. We believe alerts are an interruption to the daily schedule and such an interruption should only occur if it’s justified. To avoid getting such alerts, we recommend filtering your alert rules on “Failure” or “Error” statuses. + +## Notifying stakeholders + +There are several ways to notify data consumers and stakeholders about ongoing problems. +While some customers prefer to do it personally after triaging the incidents, others prefer saving this time and going with automated notifications. +For models intended for public consumption (by BI dashboards, ML models etc) we recommend setting up [subscribers](https://docs.elementary-data.com/oss/guides/alerts/alerts-configuration#subscribers). Those subscribers will be tagged in Slack on every alert that is sent on those tables. Unlike owners, there can be many subscribers to an alert. +Tagging subscribers is of course optional, and simply adding them to the relevant channels can also suffice. +Coming soon: +As part of the data health scores release, we will be supporting a new type of alerts, that notifies a drop in the health score of an asset. This type of alert is intended for data consumers, who don’t need the details and just want a high-level notification in case the data asset shouldn’t be used. We will also support sending daily digests on all assets’ health scores. + +## Incident management + +Elementary has an incident page, new failures will either create an incident or be attached to an open incident. +This page is designed to enable your team to stay on top of open incidents and collaborate on resolving them. The page gives a comprehensive overview of all current and previous incidents, where users can view the status, prioritize, assign and resolve incidents. +![Incident management dashboard](https://res.cloudinary.com/diuctyblm/image/upload/v1738149956/Docs/incident-management_up6jzx.png) + +### Incident management usability + +- Each incident has 3 settings: Assignee, status and severity. + - These can be changed directly from the Slack notification. + - The severity is set to `high` for failures and `normal` for warnings. You can manually change to `critical` or `low` . + - You can select several incidents and make changes to the settings in bulk. +- Failures of the same test / model of an open incident will not open a new incident, these will be added to an ongoing incident. + +### Incident management best practices + +- Your goal should be to lower the time to resolution of incidents. + - Incidents should have a clear assignee. + - Use the quick view of open and unassigned incident to monitor this. + - The best implementation for this is pre-defining the assignee as owner, so they will get tagged on the failure. + - Set clear expectations with assignees. + - These can be set on severity of incidents. For example: + - Critical - Should be handled immediately. + - High - Should be resolved by end of day. + - Normal - Should be resolved by end of week. + - Low - Should be evaluated weekly, might trigger a change in coverage. +- If no one cares about an incident, this should impact coverage. + +### Coming soon + +Incidents is a beta feature, and we are working on adding functionality. The immediate roadmap includes: + +- Notifications to assignees +- Mute / Snooze +- Advanced grouping of failures to incidents according to lineage (example: model failure + all downstream freshness and volume failures) +- Initiating triage from incident management (see picture) + +![An interface showing initiating triage from incident management](https://res.cloudinary.com/diuctyblm/image/upload/v1738149956/Docs/triage-response-via-incident-management_acjqow.png) + +## Triage incidents + +When triaging incidents, there are 4 steps to go through: + +1. Impact analysis - Although root cause analysis will lead you to resolve the issue, impact analysis should be done first. The reason is the impact determines the criticality of the incidents, and therefore the priority and response time. +2. Root cause analysis +3. Resolution +4. Post mortem - Quality learnings from incidents is how you improve over time, and reduce the time to resolution and frequency of future incidents. + +### Impact analysis + +The goal of doing an impact analysis is to determine the severity and urgency of the incident, and understand if you need to communicate the incident to consumers (if there isn’t a relevant alert rule). +These are the questions that should be asked, and product tips on how to answer with Elementary: + +- Was this a failure or just a warning? + - As long as you and your team are intentional in determining severities, this can focus you on failures first. +- Does the incident break the pipeline / create delay? + + - Is the failure is a model failure, or a freshness issue? + - Do we run `dbt build` and this failure stopped the pipeline? + + - Check the **Model runs** section of the dashboard to see if there are skipped models, as failures in build cause the downstream models to be skipped. + + ![Model runs portion of the dashboard](https://res.cloudinary.com/diuctyblm/image/upload/v1738149955/Docs/dashboard-model-runs_zzgnd2.png) + + +- How important is the data asset? + - Check in the catalog or node info section in the lineage if it has a tag like `critical` , `public` or a data product tag. You can also look at the description of the data asset, whether it’s a table or a column. +- Does the failure impact important downstream assets? Did the issue propagate to downstream assets? + + - A table might not be critical, but it’s upstream from a critical one, making it part of a critical path. + + - Check in the lineage if there are downstream important BI assets / public tables. To see the downstream assets you can navigate to the lineage directly from the test results, but clicking `view in lineage`. If the incident is a failed column test, you can filter only the downstream lineage of the specific column by clicking on `filter column`. + + ![Lineage filters](https://res.cloudinary.com/diuctyblm/image/upload/v1738149955/Docs/lineage-filters_ipjze3.png) + + - Use the lineage filters to color and highlight all the tables in a path that match your filtering criteria. + + ![Lineage filters showing the add filter interface](https://res.cloudinary.com/diuctyblm/image/upload/v1738149955/Docs/lineage-filters-2_cda4on.webp) + + - If there are downstream critical tables, you might want to check if the issue actually propagated to it. A quick way to do it is to copy the test query, and run it on downstream assets (by changing the referenced table and column). + +- What is the magnitude of the failure? How many failed results out of the total volume? + + - Most tests return the number of failed results. A failure in a `unique` test for can be dramatic if it impacts many rows, but insignificant if there is just one case of duplicates. + + - You can see the total number of failures as part of the test result / alert. + - On the `Test performance` page, you can compare this number to previous failures of the same tests. + + ![Graph showing test performance](https://res.cloudinary.com/diuctyblm/image/upload/v1738149955/Docs/test-performance-graph_g5t5p5.png) + +## Root cause analysis + +If the incident is not important we recommend resolving the incident and then removing / disabling the test. +If the incident is important we need to start the investigation process, and understand the root cause. Your failures would usually be caused by issues at the source, code changes, or an infrastructure issue. + +- Is there a data issue at the source? + - Check in the lineage and see if there is coverage and failures on upstream tables, you can use the lineage filters to limit the scope to relevant failures (if `not_null` failed, filter on `not_null` tests). + - Check the test result sample. If you want to see more results copy the test SQL and run it in your DWH console. + - Sometimes an issue would be in a certain dimension, like a specific product event that stopped arriving or changed. Aggregate the test query by key dimensions in the table to understand if it’s relevant to specific subset of the data. + - _Coming soon - Automated post failure queries._ + - Check if the test is flaky in the `test performance` screen. This usually means it’s a problem that happens frequently at the source data. + - _Coming soon - Check the metric graphs of the source tables._ +- Is it a code issue? + - Check recent PRs to the underlying monitored table. + - _Coming soon - Incident timeline with recent PRs and changes._ + - Check recent PRs merged to upstream tables. + - Are there any other related failures that happen at the same time following a recent release? + - Check metrics and test results like volume of tables to see if there is a wrong join. +- If the result is an `error` and not `fail` or `warning`, it means the test / model failed to run. This can either be caused by a timeout or issue at the DWH, or by a code change that lead to a syntax error / broken lineage. + - Look at the error message to understand if it comes from dbt or the DWH, and what is the issue. + +## Post mortem - Learning from incidents + +Learning from incidents we had is how we improve our coverage, response times and reliability. + +Here are some common actions to take following an incident: + +- Incident wasn’t important - If the incident wasn’t important or significant, remove the test or change the severity to warning. +- It was hard to determine the severity of the incident - Make changes to the tags and descriptions of the test / asset, to make it easier next time. +- The relevant people weren’t notified - Make changes to owners and subscribers, and create the relevant alert rules. +- The result sample was not helpful - Make changes to the test query, to make it easier next time. +- Reoccurring incidents at the source - For incidents that keep happening, the most productive approach is to have a conversation with your data providers, and figure out how to improve response. You can use the `test performance` page, and past incidents on the `incidents` page to communicate stats on the previous incidents. \ No newline at end of file diff --git a/docs/cloud/cloud-vs-oss.mdx b/docs/cloud/cloud-vs-oss.mdx new file mode 100644 index 000000000..74b5e6a70 --- /dev/null +++ b/docs/cloud/cloud-vs-oss.mdx @@ -0,0 +1,58 @@ +--- +title: "Elementary OSS vs. Elementary Cloud" +'og:title': "Elementary OSS vs. Elementary Cloud" +sidebarTitle: "Cloud vs OSS" +description: "Detailed comparison of Elementary product offerings." +icon: "list-check" +--- + + +If you’re just beginning your data quality journey, the decision between OSS and Cloud depends on your goals and team setup: + + **Elementary OSS** + +A self-maintained, open-source CLI that integrates seamlessly with your dbt project and the Elementary dbt package. It enables alerting and provides the self-hosted Elementary data observability report, offering a comprehensive view of your dbt runs, all dbt test results, data lineage, and test coverage. + + **Elementary Cloud** + +A fully managed, enterprise-ready solution designed for scalability and automation. It offers automated ML-powered anomaly detection, flexible data discovery, an integrated incident management system, and collaboration features. Delivering high value with minimal setup and infrastructure maintenance, it's ideal for teams looking to enhance data reliability without operational overhead. + +This short video covers the difference between OSS and Cloud: + + +
+ +
+ + +### Comparing Elementary OSS to Elementary Cloud + +Below is a detailed comparison between the OSS and Cloud features: + +| Feature | **OSS** | **Cloud** | +|------------------------|------------------------------------------|---------------------------------------------------------------------------------------------------| +| **Detection** |
  • Anomaly detection and dbt tests
|
  • Automated freshness & volume monitors
  • ML-powered anomaly detection
  • dbt and cloud tests
  • Bulk add/edit for tests
| +| **Triage & Response** |
  • Basic alerts
  • Table-level lineage
|
  • Interactive alerts
  • Column-level lineage up to BI
  • BI integrations
  • Test results history
  • Incident management
| +| **Performance Monitoring** |
  • Model and test performance
|
  • Model and test performance
  • Performance alerts
| +| **Enabling Non-Tech Users** | X |
  • No-code test editor
  • Data health scores
  • External catalog integrations
  • Ticketing system integrations
| +| **Governance** | X |
  • Catalog
  • Metadata in code and UI
| + + +### Want to know more? + + + + diff --git a/docs/cloud/features.mdx b/docs/cloud/features.mdx index 4a27276bb..ae375317f 100644 --- a/docs/cloud/features.mdx +++ b/docs/cloud/features.mdx @@ -3,4 +3,8 @@ title: "Platform features" icon: "browsers" --- - \ No newline at end of file +import Features from '/snippets/cloud/features.mdx'; + + + + \ No newline at end of file diff --git a/docs/cloud/features/alerts-and-incidents/alert-configuration.mdx b/docs/cloud/features/alerts-and-incidents/alert-configuration.mdx new file mode 100644 index 000000000..4509ec96a --- /dev/null +++ b/docs/cloud/features/alerts-and-incidents/alert-configuration.mdx @@ -0,0 +1,19 @@ +--- +title: "Alert configuration" +--- + + + + Set up alert rules to automatically route alerts to the relevant channels according to the logic you define. + + + Configure alert content and properties as code in your project YML files. + + + Observe and control the alert destination of each test. + + + Take accountability over alerts with owners and subscribers, added from the UI or the code. + + + diff --git a/docs/cloud/features/alerts-and-incidents/alert-destinations.mdx b/docs/cloud/features/alerts-and-incidents/alert-destinations.mdx new file mode 100644 index 000000000..3d00810a2 --- /dev/null +++ b/docs/cloud/features/alerts-and-incidents/alert-destinations.mdx @@ -0,0 +1,34 @@ +--- +title: Alerts destinations management +sidebarTitle: Alerts destinations management +--- +The Alert Destinations tab helps you control where alerts are sent for each test, directly from the test configuration page or the test side panel. + +This makes it easy to understand how alerts are routed, reduce noise, and manage settings without switching contexts. + + +
+ Alert destinations management +
+ + +## What You Can See + +The **Alert Destinations** tab provides a detailed view of your alerting setup for each test, including: + +- A list of your tests, along with their **assigned destinations**, highlighting both active configurations and any gaps in your alerting strategy +- The **current destinations** where alerts are being sent, and how many tests are being assigned for each one +- The **routing method** used for each alert: whether it’s defined by an **alert rule**, a **custom configuration**, or set directly in **code** +- The **alert status** for each test—clearly indicates whether alerts are **enabled** or **disabled** + +You’ll have full visibility into how alerts for a specific test are managed. + +## What You Can Do + +- **Enable or disable alerts** for individual tests or in bulk +- **Override alert destinations** in bulk for faster configuration updates. After overriding, the alert type will be changed to `CUSTOM` and any other alerting logic (code/rules) will not be applied. + +This feature makes it easier to manage alert delivery and ensure alerts are routed as expected, especially across different teams or test types. diff --git a/docs/cloud/features/alerts-and-incidents/alert-rules.mdx b/docs/cloud/features/alerts-and-incidents/alert-rules.mdx new file mode 100644 index 000000000..fb19bb257 --- /dev/null +++ b/docs/cloud/features/alerts-and-incidents/alert-rules.mdx @@ -0,0 +1,52 @@ +--- +title: "Alert rules" +--- + +**Alert Rules** help you control where alerts are sent and when they are triggered, so you can stay focused on the incidents that matter most. In Elementary Cloud, each rule combines **filters** (what to alert on) and **destinations** (where to send the alert). + +### Destinations: Where Alerts Are Sent + +A destination defines where an alert notification will be delivered. You can route alerts to messaging apps (like [Slack](/cloud/integrations/alerts/slack) or [Microsoft Teams](/cloud/integrations/alerts/ms-teams)) or incident management tools (like [PagerDuty](/cloud/integrations/alerts/pagerduty) or [Opsgenie](/cloud/integrations/alerts/opsgenie)). For all [supported tools](cloud/features/alerts-and-incidents/alerts-and-incidents-overview#supported-alert-integrations), you can also specify the exact channel, team, or escalation path for the alert. Each alert can be routed to one or multiple destinations — within the same tool or across different tools. + +### Filters: What Triggers an Alert + +Filters define which alerts are routed through a rule. They help you fine-tune alerting, reduce noise, and focus on critical issues. +You can filter by: +- **Assets** — Filter alerts based on model name, owner, tag, or status. +- **Test Types** — Filter by the type of test that triggered the alert, such as volume anomalies, freshness anomalies, model errors, or dbt tests. +- **Downstream Impacted Assets** — Filter alerts based on the downstream impact of an incident. + Trigger alerts only when specific downstream assets are affected. You can filter impacted assets by: + - **Criticality** — Only alert if critical assets are impacted. + - **Tags** — Filter assets by tag. + - **Owners** — Filter assets by owner. + - **Asset Type** — Model, source, seed, snapshot, or exposure. + +This helps ensure that alerts are only triggered when truly important assets are impacted, reducing alert fatigue and keeping your team focused. + +Downstream impact is currently determined based on table-level lineage. + +### Rule order matters +The order of your rules is important. Here’s how it works: +- When an alert is generated, it's evaluated in order, from top to bottom. +- As soon as a rule matches, the alert is routed to that rule’s destination. +- No further rules are checked after the first match. + +If you want alerts to be routed based on all applicable rules (not just the first match), we’re testing a Beta feature that enables this. Reach out if you’d like early access! + + + + +### Alert rules default configuration + +- The channel you choose when connecting your messaging app ([Slack](/cloud/integrations/alerts/slack), [Microsoft Teams](/cloud/integrations/alerts/ms-teams), etc.) +is automatically added as a default alert rule, that sends all the failures to that channel without any filtering. +- By default, warnings do not send alerts. + +To modify, deactivate or add more rules, simply navigate to the `Alert Rules` page in the menu. diff --git a/docs/cloud/features/alerts-and-incidents/alerts-and-incidents-overview.mdx b/docs/cloud/features/alerts-and-incidents/alerts-and-incidents-overview.mdx new file mode 100644 index 000000000..5bfbc9b60 --- /dev/null +++ b/docs/cloud/features/alerts-and-incidents/alerts-and-incidents-overview.mdx @@ -0,0 +1,63 @@ +--- +title: Alerts and Incidents Overview +sidebarTitle: Alerts & incidents overview +--- + +import AlertTypes from '/snippets/cloud/features/alerts-and-incidents/alert-types.mdx'; +import AlertsDestinationCards from '/snippets/cloud/integrations/cards-groups/alerts-destination-cards.mdx'; + + + +Alerts and incidents in Elementary are designed to shorten your time to response and time to resolution when data issues occur. + +- **Alert -** Notification about an event that indicates a data issue. +- **[Incident](/cloud/features/alerts-and-incidents/incidents) -** A data issue that starts with a single event but can include multiple events grouped together. An incident includes a start time, status, severity, assignee, and end time. Incident alerts are sent when the incident is opened and when it is resolved. + +Alerts provide information and context for recipients to quickly triage, prioritize and resolve issues. +For collaboration and promoting ownership, alerts include owners and tags. +You can create distribution rules to route alerts to the relevant people and channels, for faster response. + + + +
+ Slack alert format +
+ + + +An alert would either open a new incident, or be automatically grouped and added to an ongoing incident. +From the alert itself, you can update the status and assignee of an incident. In the [incidents page](/cloud/features/alerts-and-incidents/incident-management), +you will be able to track all open and historical incidents, and get metrics on the quality of your response. + +## Alerts & incidents core functionality + +- **Alerts customization** - Alerts should include relevant context for quick triage such as **owner**, **tags**, **description**. In Elementary, alerts can be customized to include this information. +- **Alert distribution rules** - Alerts should be sent to relevant recipients. By creating [Alert Rules](/cloud/features/alerts-and-incidents/alert-rules), alerts can be distributed to different channels and systems. +- **Incidents management** - When alerts are distributed to different channels, it can become hard to track what is open. Elementary offers a centralized Incidents page to monitor what is open, and manage incident properties: **assignee**, **status** and **severity**. +- **Grouping alerts to incidents** - New failures related to already open incidents will not trigger new alerts, and will be automatically added to the ongoing incident. This reduces noise and alert fatigue. +- **Automated resolution** - When there is a successful run that means an open incident is resolved, Elementary will automatically resolve the incident. This will help you manage the state of incidents and communicate it to stake holders in real time. +- **Mute test alerts** – Mute your test from the test configuration tab to run tests without triggering alerts, giving you more control over notifications while still monitoring data quality. This is useful when testing new data sets, refining thresholds, or adjusting test logic without unnecessary noise. + + + + + +## Alert types + + + +## Supported alert integrations + + diff --git a/docs/cloud/features/alerts-and-incidents/incident-digest.mdx b/docs/cloud/features/alerts-and-incidents/incident-digest.mdx new file mode 100644 index 000000000..360deb73c --- /dev/null +++ b/docs/cloud/features/alerts-and-incidents/incident-digest.mdx @@ -0,0 +1,61 @@ +--- +title: "Incidents Digest" +sidebarTitle: "Incidents digest" +--- + +**Incidents Digest** lets you send a scheduled summary of incidents to your team in addition to real-time alerts. Your team receives a consolidated digest on a daily or weekly cadence covering all relevant incidents in that period. + +This is useful for reducing noise, providing management-level visibility, or keeping stakeholders informed without overwhelming them with individual alerts. + + + Incidents Digest tab in Alert Rules + + +## How it works + +Digest rules are configured under **Alert Rules → Incidents Digest** tab. Each rule defines: + +- **When** to send the digest (cadence: daily or weekly, at a specific hour) +- **What** incidents to include (filters by tag, owner, status, model, or category) +- **Where** to send it (one or more destinations such as a Slack channel or email) + +At the scheduled time, Elementary collects all incidents that match the rule's filters since the last digest was sent, and delivers them as a single message. + + + Incidents digest email + + +## Creating a digest rule + +Navigate to **Alert Rules** in the left sidebar and select the **Incidents Digest** tab. Click **Create digest rule** to open the configuration drawer. + +The filters, categories, and destinations are configured the same way as in [Alert Rules](/cloud/features/alerts-and-incidents/alert-rules). The one addition unique to digest rules is **cadence**. + +### Cadence + +Choose how often the digest should be sent: + +- **Daily** — sent once a day at the hour you choose. +- **Weekly** — sent once a week on the day and at the hour you choose. + +The time is in UTC. + + + Digest cadence selector + + +## Managing digest rules + +Once created, digest rules appear as cards in the **Incidents Digest** tab. From each card you can: + +- **Edit** the rule to change any configuration. +- **Activate / Deactivate** the rule without deleting it. +- **Delete** the rule. + + + Digest rule action menu + + +## Relationship with alert rules + +Digest rules are independent of [Alert Rules](/cloud/features/alerts-and-incidents/alert-rules). You can have both real-time alert rules and digest rules active at the same time — they are evaluated separately. A common pattern is to use real-time alert rules for critical on-call channels and digest rules for broader team summaries. diff --git a/docs/cloud/features/alerts-and-incidents/incident-management.mdx b/docs/cloud/features/alerts-and-incidents/incident-management.mdx new file mode 100644 index 000000000..85607407e --- /dev/null +++ b/docs/cloud/features/alerts-and-incidents/incident-management.mdx @@ -0,0 +1,54 @@ +--- +title: Incident Management +sidebarTitle: Incident management +--- + +The `Incidents` page is designed to enable your team to stay on top of open incidents and collaborate on resolving them. +The page gives a comprehensive overview of all current and previous incidents, where users can view the status, prioritize, assign and resolve incidents. + +## Incidents view and filters + +The page provides a view of all incidents, and useful filters: + +- **Quick Filters:** Preset quick filters for all, unresolved and “open and unassigned” incidents. +- **Filter:** Allows users to filter incidents based on various criteria such as status, severity, model name and assignee. +- **Time frame:** Filter incidents which were open in a certain timeframe. + + + + +## Interacting with Incidents + +An incident has a status, assignee and severity. +These can be set in the Incidents page, or from an alert in integrations that support alert actions. + +- **Incident status**: Will be set to `open` by default, and can be changed to `Acknowledged` and back to `Open`. When an alert is manually or automatically set as `Resolved`, it will close and will no longer be modified. +- **Incident assignee**: An incident can be assigned to any user on the team, and they will be notified. + - If you assign an incident to a user, it is recommended to leave the incident `Open` until the user changes status to `Acknowledged`. +- **Incident severity**: Severity of an incident can be Low, Normal, High or Critical. By default, model errors are set to Critical, test failures set to High and warnings are marked as Normal, but the severity can be changed manually. _Coming soon_ : Severity will be automated by an analysis of the impacted assets. + + Incident severity is used to prioritize incidents and to set the urgency of resolving them. it is not the same as dbt test severity + +## Incidents overview and metrics + +The incidents are divided into categories based on their status, and the user can view the number of incidents in each category by severity. +For resolved incidents, the user can view the average resolution time. + +_Coming soon_ : The option to create and share a periodic summary of incidents will be supported in the future. + + +
+ Incidents overview +
+ \ No newline at end of file diff --git a/docs/cloud/features/alerts-and-incidents/incidents.mdx b/docs/cloud/features/alerts-and-incidents/incidents.mdx new file mode 100644 index 000000000..bdbee78c9 --- /dev/null +++ b/docs/cloud/features/alerts-and-incidents/incidents.mdx @@ -0,0 +1,73 @@ +--- +title: Incidents in Elementary +sidebarTitle: Incidents +--- + +One of the challenges data teams face is tracking and understand and collaborate on the status of data issues. +Tests fail daily, pipelines are executed frequently, alerts are sent to different channels. +There is a need for a centralized place to track: +- What data issues are open? Which issues were already resolved? +- Who is on it, and what's the latest status? +- Are multiple failures part of the same issue? +- What actions and events happened since the incident started? +- Did such issue happen before? Who resolved it and how? + +In Elementary, these are solved with `Incidents`. + +A comprehensive view of all incidents can be found in the [Incidents page](/cloud/features/alerts-and-incidents/incident-management). + +## How incidents work? + +Every failure or warning in Elementary will automatically open a new incident or be added as an event to an ongoing incident. +Based on grouping rules, different failures are grouped to the same incident. + +An incident has a [status, assignee and severity](/cloud/features/alerts-and-incidents/incident-management#interacting-with-incidents). +These can be set in the [Incidents page](/cloud/features/alerts-and-incidents/incident-management), or from an alert in integrations that support alert actions. + + +
+ Elementary Incidents +
+ + +## How incidents are resolved? + +Each incident starts at the first failure, and ends when the status is changed manually or automatically to `Resolved`. +An incident is **automatically resolved** when the failing tests, monitors and / or models are successful again. + +## Incident grouping rules + +Different failures and warnings are grouped to the same incident by the following grouping rules: + +1. Additional failures of the same test / monitor on a table that has an active incident. +2. _ _Coming soon_ _ Freshness and volume issues that are downstream of an open incident on a model failure. +3. _ _Coming soon_ _ Failures of the same test / monitor that are on downstream tables of an active incident. + +## Incident deep dive + +Clicking on an incident will open the test overview side panel, showing the following information: +1. Test owner, tags and subscribers (if the incident is a model failure, the model owner, tags and subscribers will be shown). +2. The execution history of the test / model, including the following information on each execution: + - Execution time + - Result (pass / fail / warning, etc) + - If the test failed - + - a sample of the failed rows + - The Slack channel where the alert was sent + - For anomaly tests - the result chart + - Compiled query +3. Configuration of the test / model - the Yaml or SQL code of the test / model. For cloud tests, the configuration is also editable. + + +You can also see the list of upstream and downstream assets - if the test is a column test you can see the upstream and downstream columns, if it's a table test you can see the upstream and downstream tables. + + +
+ Elementary Test Overview side panel +
+ diff --git a/docs/cloud/features/alerts-and-incidents/owners-and-subscribers.mdx b/docs/cloud/features/alerts-and-incidents/owners-and-subscribers.mdx new file mode 100644 index 000000000..4486f302c --- /dev/null +++ b/docs/cloud/features/alerts-and-incidents/owners-and-subscribers.mdx @@ -0,0 +1,24 @@ +--- +title: "Owners and subscribers" +--- + +import Owner from '/snippets/alerts/owner.mdx'; +import Subscribers from '/snippets/alerts/subscribers.mdx'; + + + +We highly recommend configuring owners and subscribers for your models and/or tests. +An owner is the person responsible for the model, and subscribers are the people who are interested in getting the alerts on the model or test. +Owners and subscribers will be mentioned (tagged) in the Slack alerts. Also their name will appear in the alerts and the UI. + +configuring owners and subscribers: + +### Owners +Owners can be easily added or edited in the Catalog screen. [Learn more about managing your assets' metadata.](/cloud/features/data-governance/manage-metadata) + +#### Configuting owners in code: + + +### Subscribers + + diff --git a/docs/cloud/features/anomaly-detection/automated-freshness.mdx b/docs/cloud/features/anomaly-detection/automated-freshness.mdx new file mode 100644 index 000000000..c86e0f77a --- /dev/null +++ b/docs/cloud/features/anomaly-detection/automated-freshness.mdx @@ -0,0 +1,68 @@ +--- +title: Automated Freshness Monitor +sidebarTitle: "Automated freshness" +--- + +import FreshnessConfiguration from '/snippets/cloud/features/anomaly-detection/freshness-configuration.mdx'; +import AllAnomaliesConfiguration from '/snippets/cloud/features/anomaly-detection/all-anomalies-configuration.mdx'; + + + +The purpose of the Freshness monitor is to alert when a data asset hasn't been updated in a period of time that exceeds the update SLA of that table. +Freshness monitors are by default created for all sources in your dbt project. They can be created for additional tables upon request. + +The freshness monitoring has 2 operation modes, it's possible to choose the desired one from the Test Overview side panel: + + +
+ Automated freshness result +
+ + + +### Anomaly detection based +The default operation mode of the freshness monitors. +It learns the update frequency of your tables and consistently checks if the table is **currently** fresh based on our model's forecast. + +By default we use 21 days of training data to understand the intervals between table updates. +The only condition that determines the status of the monitor is the time that has passed since the last update, which is compared to the model's prediction. + +The model takes into account seasonality, and supports cases such as tables that update on weekdays and not weekends. + +### SLA based +Sometimes you might want to monitor a table based on a fixed SLA, in order to have full control over when the monitor will alert. +This mode is based on a fixed SLA that you define for each table. +The monitor will alert if the table hasn't been updated in the defined SLA period. + + +## Understand the monitor result + + +
+ Automated freshness result +
+ + +The test result is a timeline of updates. + +The right end of the timeline, marked with a black triangle ▽, is the timestamp of the test result, which is near real time (can be considered as "now"). +Each update to the table is presented as a line in the timeline. +Hovering on the gaps between updates will show the updates time and time gap. + +To understand the test result, focus on the gap between the last update and now (▽): + +- Green - The gap between latest update and now is still within the expected range. +- Yellow / Red - The gap between latest update and now is above the expected range, a dotted line will show what was the expected gap limit. The color represents if this is a warning or failure. + +Use the `Anomaly settings` and `result feedback` buttons to impact the monitor. + +### Anomaly settings + + + diff --git a/docs/cloud/features/anomaly-detection/automated-monitors.mdx b/docs/cloud/features/anomaly-detection/automated-monitors.mdx new file mode 100644 index 000000000..c35d059c9 --- /dev/null +++ b/docs/cloud/features/anomaly-detection/automated-monitors.mdx @@ -0,0 +1,58 @@ +--- +title: Automated Freshness & Volume Monitors +sidebarTitle: "Introduction" +--- + +import AutomatedMonitorsIntro from '/snippets/cloud/features/anomaly-detection/automated-monitors-intro.mdx'; +import AutomatedMonitorsCards from '/snippets/cloud/features/anomaly-detection/automated-monitors-cards.mdx'; + + + +Once your environment is set up, we automatically collect metadata from your warehouse, which our ML models run on. +The models are operational when the initial backfill is completed, there is no "loading / training period" - Elementary will collect enough historical data after setup to train the models. + +Monitors are automatically created on all of your sources, and additional ones can be [added, edited and removed manually](/cloud/features/anomaly-detection/monitors-configuration). Their results are displayed in the application in the same way as package tests. + +#### Benefits of automated monitors + +1. **Zero configuration** - Our machine learning models learn data behavior, eliminating the need for manual configuration. +2. **Out-of-the-box coverage** - Rather than manually configuring a test for each model, Elementary automatically creates monitors for every source in your dbt project once you set up your environment. +3. **Metadata only, minimal cost** - The monitors rely on data warehouse metadata, and don't consume compute resources. + + +### How it works? + +The monitors collect metadata, and the [anomaly detection model](/cloud/features/anomaly-detection/monitors-overview#how-anomaly-detection-works?) adjusts based on updates frequency, seasonality and trends. + +As soon as you connect Elementary Cloud Platform to your data warehouse, a backfill process will begin to collect historical metadata. +Within an average of a few hours, your automated monitors will be operational. +By default, Elementary collects at least 21 days of historical metadata. + +The automated monitors are created for all the sources in your dbt project. +If you would like a different configuration of which tables to create the monitors on, or adding monitors for all models, you can reach out to us. + +You can fine tune the [configuration](/cloud/features/anomaly-detection/monitors-configuration) and [provide feedback](/cloud/features/anomaly-detection/monitors-feedback) to adjust the detection to your needs. + +As views are stateless, automated volume and freshness monitors only apply on tables. + +## Automated Monitors + + + +## Alerts on Failures + +By default, automated monitors failures **don't create alerts**. + +To activate alerts on automated monitors, navigate to `Setup > Alert Rules`. +- To alert on all automated monitors failures - Change the default rule (#1) alert categories to include automated monitors. +- To alert on specific datasets - Change / Create alert rules for these specific datasets, and include automated monitors in their alert categories. + + +
+ Alert categories in alert rules +
+ diff --git a/docs/cloud/features/anomaly-detection/automated-volume.mdx b/docs/cloud/features/anomaly-detection/automated-volume.mdx new file mode 100644 index 000000000..6e45c285a --- /dev/null +++ b/docs/cloud/features/anomaly-detection/automated-volume.mdx @@ -0,0 +1,50 @@ +--- +title: Automated Volume Monitor +sidebarTitle: "Automated volume" +--- + +import VolumeConfiguration from '/snippets/cloud/features/anomaly-detection/volume-configuration.mdx'; +import AllAnomaliesConfiguration from '/snippets/cloud/features/anomaly-detection/all-anomalies-configuration.mdx'; + + + +The volume monitor tracks the **total row count** of a table over time, rather than individual table updates. +This means that Elementary will not consider a single update as anomalous, but rather a continuous anomalous trend occurring over a period of time. + + +
+ Automated volume result +
+ + +### Understand the monitor result + +The test data set is divided into two periods - + +1. Training Period - The historical behavior of the table's volume, patterns, and so forth. By default it will include 21 days. +2. Detection Period - This is the period within which we look for anomalies. By default it’s set to the last 48 hours. + +Data points and expected range - +- Data points within the training period are dark grey, and data points within the detection period are colored. +- The light grey area around the data points represents the model expected range. Data points outside this range are considered anomalous. +- Hovering over a data point will detail the row count timestamp, row count and expected range. + +Use the `Anomaly settings` and `result feedback` buttons to impact the monitor. + +### Anonmaly settings + + + + + \ No newline at end of file diff --git a/docs/cloud/features/anomaly-detection/metrics.mdx b/docs/cloud/features/anomaly-detection/metrics.mdx new file mode 100644 index 000000000..65511875b --- /dev/null +++ b/docs/cloud/features/anomaly-detection/metrics.mdx @@ -0,0 +1,105 @@ +--- +title: Metrics +sidebarTitle: "Metrics" +--- + +In Elementary, you can monitor any metric you want on the content of your data, view the metric and set up anomaly detection tests on it! + +## How does it work? + +Elementary uses a type of dbt test to collect metrics on your data and sync them up to Elementary Cloud. +The metrics will be collected each time you run `dbt test`, in a way that is similar to how training data for anomaly tests is collected. +A metric can be collected with or without an anomaly detection test configured on it. + +Metrics screen in Elementary +Metrics screen in Elementary + + +## How to set up a data content metric? + +The monitored metrics are set up in the code, in a way similar to dbt tests. + + + + +No mandatory configuration, however it is highly recommended to configure a `timestamp_column`. + +{/* prettier-ignore */} +
+ 
+  data_tests:
+      -- elementary.collect_metrics:
+          arguments:
+              timestamp_column: column name
+              time_bucket:
+                period: [hour | day]
+                count: int
+              dimensions: sql expression
+              metrics: monitors list
+               name: string
+               type: monitor type
+               columns: list
+              where_expression: sql expression
+ 
+
+ + + +```yml Models +models: + - name: < model name > + data_tests: + - elementary.collect_metrics: + arguments: + timestamp_column: < timestamp column > + time_bucket: # Daily by default + period: < time period > + count: < number of periods > + dimensions: < list of dimensions to group by > + metrics: < list of metrics > + - name: < user defined name for metric > + type: < which metric to calculate > + columns: < which columns to calculate the metric on- for column metrics > + where_expression: < sql expression > + cloud_monitored: < boolean, should Elementary automatically create anomaly tests for the collected metrics? > +``` + +```yml Models example + +models: + - name: login_events + data_tests: + - elementary.collect_metrics: + arguments: + timestamp_column: 'loaded_at' + time_bucket: + period: hour + count: 1 + dimensions: + - country_id + - platform_id + metrics: + - name: row_count + type: row_count + - name: filtered_row_count + type: row_count + - name: null_count + type: null_count + columns: ["hello", "world"] + where_expression: "country = 'USA'" + cloud_monitored: true +``` + + + +Upon running `dbt test`, your data is split into buckets: +- The size of each bucket is configured by the `time bucket` field +- Each row in the table is assigned to a bucket based on the timestamp of the `timestamp_column` selected +- If dimensions were selected, each combination of dimensions will create a separate metric +and then we compute the metric (or metrics) of choice for each bucket. +We save the metrics in the Elementary schema and sync it to Elementary Cloud whenever you sync your environment. + +The metric chart will be visible in the Metrics screen, along with all the metrics Elementary has collected. +If a test was created for the metric, it will be visible in Elementary just like any other test. + +To include alerts on Metrics Tests in your alert rules, simply edit your alert rule and check the box "Metric anomalies" under "Test Categories". diff --git a/docs/cloud/features/anomaly-detection/monitor-dwh-assets.mdx b/docs/cloud/features/anomaly-detection/monitor-dwh-assets.mdx new file mode 100644 index 000000000..e9979db2f --- /dev/null +++ b/docs/cloud/features/anomaly-detection/monitor-dwh-assets.mdx @@ -0,0 +1,43 @@ +--- +title: Monitor DWH Assets +sidebarTitle: "Monitor DWH assets" +--- + +By default, Elementary monitors assets that are defined as models and sources in your dbt projects. This approach ensures that we focus on assets that are actively used in production, as defined in your dbt codebase. + +However, you can now extend monitoring to additional data warehouse assets that are not part of your dbt project. This enables comprehensive observability across your entire data warehouse, including tables and views that exist outside of your dbt transformations. + + +This feature is currently in **beta** and is only supported for **BigQuery**, **Snowflake**, and **Databricks**. Self-serve schema syncing will be available soon. Until then, to sync additional assets to Elementary, please contact support. + + +## Syncing DWH schemas + +You can sync schemas directly from your data warehouse to Elementary. Once synced, these assets will appear in the side tree under the **DWH view**, making them easily accessible alongside your dbt models and sources. + +This feature allows you to: +- Monitor assets that are created outside of dbt (e.g., tables created by other tools, ETL processes, or direct SQL scripts) +- Gain visibility into your entire data warehouse inventory +- Apply the same monitoring and testing capabilities to non-dbt assets + +## Configuring tests on DWH assets + +After syncing your DWH schemas, you can configure freshness and volume tests on these assets directly from the **test configuration menu**. The process is the same as configuring tests on dbt models: + +1. Navigate to the test configuration page +2. Select the DWH asset you want to monitor +3. Choose either a **freshness** or **volume** test +4. Configure the test parameters as needed +5. Submit the test configuration + +These tests will run alongside your dbt tests and provide the same level of monitoring and alerting capabilities. + +## Lineage integration + +DWH assets that have dependencies within your dbt projects will appear in the lineage view. Currently, **views** are fully supported in the lineage visualization, and **tables** support is coming soon. + +This integration helps you: +- Understand how your dbt models depend on or are used by non-dbt assets +- Visualize the complete data flow across your warehouse +- Identify dependencies and impacts across your entire data ecosystem + diff --git a/docs/cloud/features/anomaly-detection/monitors-configuration.mdx b/docs/cloud/features/anomaly-detection/monitors-configuration.mdx new file mode 100644 index 000000000..ae3fa5bac --- /dev/null +++ b/docs/cloud/features/anomaly-detection/monitors-configuration.mdx @@ -0,0 +1,100 @@ +--- +title: Monitors Configuration +sidebarTitle: "Monitors configuration" +--- + +import AllAnomaliesConfiguration from '/snippets/cloud/features/anomaly-detection/all-anomalies-configuration.mdx'; +import VolumeConfiguration from '/snippets/cloud/features/anomaly-detection/volume-configuration.mdx'; +import FreshnessConfiguration from '/snippets/cloud/features/anomaly-detection/freshness-configuration.mdx'; + + + +Automated anomaly detection monitors are configured on all of your sources by default. Elementary also allows you to add additional monitors, and to edit or remove existing ones. + +## Adding anomaly detection monitors + +Automated anomaly detection monitors (Cloud tests) can be [added directly through the UI](/cloud/features/data-tests/data-tests-overview), just like other data tests. +Unlike dbt-based tests, these monitors are not part of your dbt project code, so they’re added immediately — with no need to create or approve a pull request. + +To easily add a new automated monitor, follow these steps: +- Navigate to the Test Configuration page, or select your relevant assets in the Catalog +- Click 'Add Test', and choose a 'Table Test' +- If not selected earlier, choose one or more tables you would like to test +- Filter on Elementary Cloud, and choose your preferred test - [Volume](/cloud/features/anomaly-detection/automated-volume) or [freshness](/cloud/features/anomaly-detection/automated-freshness). +- Set up the test configurations, and add metadata if needed. Learn more about all supported settings [here](/cloud/features/anomaly-detection/monitors-configuration#supported-settings). +- Review and submit your test. No PR needed - the test is set up. + +## Editing anomaly detection monitors + +You can change the default settings and finetune the monitors to your needs using the `Anomaly settings` on each test. + +In general, users will rely on the automated machine learning model anomaly settings. +However, in some cases, an anomaly in the data is not relevant to your business. For this cases, the custom settings are useful. + +### Settings simulator + +For some supported settings, Elementary offers a simulation of the change impact on latest results. +You can use the `Simulate Configuration` button after the change and before saving. + + + +## Excluding time ranges from training period + +Training periods for volume tests sometimes include one-time anomalies. For example, when one of the engineers executes a manual script, or during an incident. +These anomalies might prevent our learning models from detecting a pattern in the data, resulting in `NO_DATA` status. + +This can be overcome by excluding specific sections of the training period from the monitor. +With the one-time anomaly excluded, the regular data pattern will be detected, resulting in a meaningful anomaly detection. + +How to Exclude? + +- Click on the `EXCLUDE` button on the top-right of the chat, or go to the `Test Configurations` tab. + + + + +- Go down the chart, click `EXCLUDE`, and mark the section you want to exclude. + - A confirmation dialog will appear, showing the exact samples which will be excluded. +- On confirmation, the new excluded time range will appear under **`Excluded time ranges`.** You can modify or remove it, and you can also add new ranges using the + sign if you prefer. +- To test the configuration, click “Simulate configuration”. + + + +## Remove anomaly detection monitors + +There are two ways to delete monitors from the UI. +- Test configuration page - Choose one or more tests, and an option to delete them will be available at the bottom of the page. +- Test results page - Press the `...` button on the top right of the test result and then `Delete test`. + + + +## Supported settings + +#### All monitors + + + +#### Volume monitor + + + +#### Freshness monitor + + + diff --git a/docs/cloud/features/anomaly-detection/monitors-feedback.mdx b/docs/cloud/features/anomaly-detection/monitors-feedback.mdx new file mode 100644 index 000000000..6df14f74e --- /dev/null +++ b/docs/cloud/features/anomaly-detection/monitors-feedback.mdx @@ -0,0 +1,41 @@ +--- +title: Monitors Feedback +sidebarTitle: "Monitors feedback" +--- + +Using the `Result feedback` button, you can mark results as true or false positives. +This feedback can significantly improve the accuracy of detection. + +Some results trigger an automated workflow, and all are manually reveiewd by the Elementary team. + +Just so you know - Our machine learning models thrive on your feedback! +We're always hustling to make them even better, and your feedback play a huge role in helping us achieve that. So keep those comments coming! + + +
+ Anomaly result feedback +
+ + +### False positive feedback + +To get context on your false positive result feedback and trigger a response, we ask you to select a reason: + +- **Insignificant change** - The anomaly is not drastic enough for me to care about it. Usually the action item is to relax anomaly detection sensitivity. +- **Expected outlier** - This isn't an anomaly and should be within the expected range. The action item will be to re-train the model, sometimes with a wider training set. +- **Business anomaly** - This is an anomaly, but one we expected to happen due to intentional change or business event. The action item will be to exclude the anomaly from the training set. +- **Not an interesting table** - I don't want to monitor this table. The action item is to delete the monitor. +- **Other** - Non of the other reasons are a fit. Please add a comment to describe the use case. + + +
+ False positive result feedback +
+ \ No newline at end of file diff --git a/docs/cloud/features/anomaly-detection/monitors-overview.mdx b/docs/cloud/features/anomaly-detection/monitors-overview.mdx new file mode 100644 index 000000000..81d95bd85 --- /dev/null +++ b/docs/cloud/features/anomaly-detection/monitors-overview.mdx @@ -0,0 +1,35 @@ +--- +title: Anomaly Detection Monitors +sidebarTitle: "Monitors overview" +--- + +import AutomatedMonitorsIntro from '/snippets/cloud/features/anomaly-detection/automated-monitors-intro.mdx'; +import AutomatedMonitorsCards from '/snippets/cloud/features/anomaly-detection/automated-monitors-cards.mdx'; + +ML-powered anomaly detection monitors automatically identify outliers and unexpected patterns in your data. +These are useful to detect issues such as incomplete data, delays, a drop in a specific dimension or a spike in null values. + +Elementary offers two types of monitors: + +- **Automated Monitors** - Out-of-the-box monitors activated automatically, that query metadata only. +- **Opt-in Monitors** - Monitors that query raw data and require configuration. + +## [Automated monitors](/cloud/features/anomaly-detection/automated-monitors) + + + + + +## Opt-in monitors + +_Coming soon_ + + +## Monitor test results + +Each monitor returns a test result, that is one of the following four results: + +- **Passed** - The test passed, no anomaly was detected. +- **Warning** - An anomaly was detected, and the test is configured to `warn` severity. +- **Fail** - An anomaly was detected, and the test is configured to `fail` severity. +- **No data** - The monitor does not have enough data or an accurate model to monitor. Reach out to our support team to fix this. diff --git a/docs/cloud/features/ci.mdx b/docs/cloud/features/ci.mdx new file mode 100644 index 000000000..ed59c1669 --- /dev/null +++ b/docs/cloud/features/ci.mdx @@ -0,0 +1,162 @@ +--- +title: "GitHub & GitLab Data Quality Code Review" +sidebarTitle: "Code Review" +--- + +Every time a developer changes a dbt model, there's a question no one can easily answer before merging: _is this safe to ship?_ + +Your dbt tests tell you if the code compiles. They don't tell you if the model you just refactored has been failing tests for the past week, whether it feeds a dashboard your CEO looks at every morning, or whether there's already an open incident on it that your data team is investigating. + +Elementary's code review automatically answers all of that. The moment a pull request touches your dbt models, a structured comment appears with everything your team needs to make a confident merge decision, without leaving the PR. + +![Elementary code review comment](/pics/cloud/code_review_comment.png) + +## Why it matters + +Data quality issues are exponentially cheaper to catch before merge than after. But today, most teams have no visibility into data health at review time. Reviewers check the SQL, not the data. By the time a broken model hits production, it's already in dashboards, downstream models, and stakeholder reports. + +Elementary closes that gap by bringing live data quality context directly into the code review workflow. + +## What you get on every PR + +- **Test history:** pass/fail counts for each changed model over the last 7 days, so reviewers know if they're touching something that's already fragile +- **Active incidents:** any open data quality issues on those models right now, before the change lands on top of them +- **Downstream blast radius:** exactly which models, pipelines, and dashboards depend on what's changing, two levels deep +- **Health summary:** a plain-language signal on whether it's safe to merge, powered by Claude + +The comment updates automatically on every new push, so the review always reflects the latest state. No noise, no duplicate comments. + +## How it works + +The review is powered by [Claude](https://www.anthropic.com) connected to the [Elementary MCP server](/cloud/mcp/intro). When a PR is opened or updated: + +1. A CI job detects which models changed using `git diff` +2. Claude queries Elementary for live data quality context on those exact models +3. A structured Markdown summary is posted as a comment on the PR or MR + +No custom scripts. No webhook setup. No infrastructure to manage. Two secrets and one file. + +## Setup + +### Prerequisites + +- An Elementary Cloud account with the [MCP server enabled](/cloud/mcp/setup-guide) +- An [Anthropic API key](https://console.anthropic.com) + +### GitHub Actions + +**Step 1 — Add the workflow file** + +Create `.github/workflows/elementary-review.yml` in your dbt repository: + +```yaml +name: Elementary Data Quality Review + +on: + pull_request: + paths: + - "models/**/*.sql" + - "models/**/*.yml" + - "dbt_project.yml" + +jobs: + elementary-review: + runs-on: ubuntu-latest + permissions: + contents: read + pull-requests: write + steps: + - uses: actions/checkout@v4 + with: + fetch-depth: 0 + + - uses: elementary-data/elementary-ci@v1 + with: + anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }} + elementary-api-key: ${{ secrets.ELEMENTARY_API_KEY }} +``` + +**Step 2 — Add two repository secrets** + +Go to **Settings > Secrets and variables > Actions** and add: + +| Secret | Description | +|---|---| +| `ANTHROPIC_API_KEY` | Your Anthropic API key | +| `ELEMENTARY_API_KEY` | Your Elementary Cloud API key | + +That's it. The review only runs on PRs that touch model files — other PRs are ignored entirely. + + +This works for pull requests opened from branches within the same repository. GitHub does not pass repository secrets to `pull_request` workflows triggered by forks or Dependabot. + + + + +| Input | Default | Description | +|---|---|---| +| `models-path` | `models/` | Path to your dbt models directory | +| `diff-filter` | `ACMR` | File changes to include: A=Added, C=Copied, M=Modified, R=Renamed | +| `claude-model` | `claude-haiku-4-5-latest` | Claude model to use. Switch to `claude-sonnet-4-latest` for deeper analysis on complex changes | +| `base-ref` | PR base branch | Branch to diff against | +| `mcp-config-path` | _(auto-generated)_ | Path to a custom MCP config file. Only needed for self-hosted Elementary setups | + +```yaml +- uses: elementary-data/elementary-ci@v1 + with: + anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }} + elementary-api-key: ${{ secrets.ELEMENTARY_API_KEY }} + models-path: "dbt/models/" + claude-model: "claude-sonnet-4-latest" +``` + + +### GitLab CI + +**Step 1 — Add the include to your `.gitlab-ci.yml`** + +```yaml +include: + - remote: 'https://raw.githubusercontent.com/elementary-data/elementary-ci/v1/templates/mr-review.yml' +``` + +**Step 2 — Add two CI/CD variables** + +Go to **Settings > CI/CD > Variables** and add: + +| Variable | Masked | Description | +|---|---|---| +| `ANTHROPIC_API_KEY` | Yes | Your Anthropic API key | +| `ELEMENTARY_API_KEY` | Yes | Your Elementary Cloud API key | +| `GITLAB_API_TOKEN` | Yes | Optional. Project Access Token with `api` scope. Set this if you cannot enable CI/CD job token API access in project settings. | + +To post the MR comment, the template uses one of two authentication methods: + +- **`CI_JOB_TOKEN` (default):** GitLab's built-in job token, available automatically in every pipeline. Requires a project admin to enable **Settings > CI/CD > Token Access > Allow CI/CD job tokens to access this project's API**. +- **`GITLAB_API_TOKEN` (alternative):** If this variable is set, it takes priority over `CI_JOB_TOKEN`. Use a Project Access Token with `api` scope. This works without any admin settings change and is the easier option if you don't have project admin access. + +The review only runs on MRs that touch model files. Other MRs are ignored entirely. + +![Elementary code review comment on GitLab](/pics/cloud/code_review_comment_gitlab.png) + +## Troubleshooting + +**No comment appears after the job runs** + +Make sure both `contents: read` and `pull-requests: write` are set under `permissions` in the workflow. An explicit `permissions` block sets any unlisted scope to `none`, so omitting `contents: read` causes the checkout step to fail before Elementary runs. + +**`git diff` returns no changed models** + +Make sure `fetch-depth: 0` is set on the checkout step. Without full git history the runner cannot compare branches and the diff will be empty. + +**The comment says the MCP server is unreachable** + +Verify `ELEMENTARY_API_KEY` is correctly set and the MCP server is enabled for your account. See the [MCP setup guide](/cloud/mcp/setup-guide). + + +If a model has never been synced through Elementary, the comment will note that no history is available yet. Results populate automatically after the next Elementary sync. + + +## Using a different AI provider? + +The review currently uses Claude via the Anthropic API. If your team uses a different provider and you'd like to see it supported, reach out at [support@elementary-data.com](mailto:support@elementary-data.com) or on the [Community Slack](https://elementary-data.com/community). diff --git a/docs/cloud/features/collaboration-and-communication/audit_logs/overview.mdx b/docs/cloud/features/collaboration-and-communication/audit_logs/overview.mdx new file mode 100644 index 000000000..d1cd8b6b8 --- /dev/null +++ b/docs/cloud/features/collaboration-and-communication/audit_logs/overview.mdx @@ -0,0 +1,63 @@ +--- +title: Logs +sidebarTitle: Overview +--- + +The **Logs** feature allows workspace admins to track and export both user actions and system-level events across the platform. This includes user configuration changes, data synchronization events, and alert deliveries. + +This feature provides visibility into workspace activity and helps support audit readiness, team coordination, operational monitoring, and troubleshooting. + +## Log Types + +- **[User Activity Logs](/cloud/features/collaboration-and-communication/audit_logs/user-activity-logs)** - Track changes made by users across the system +- **[System Logs](/cloud/features/collaboration-and-communication/audit_logs/system-logs)** - Track system-level events and operations + +## How to Access Logs + +Once enabled for your account, access the logs: + + +
+ +
+ + +1. Click on your **account name** in the top-right corner of the UI +2. Open the dropdown menu +3. Select **Logs** +4. Choose **User Activity Logs** or **System Logs** from the log type selector +5. Select the number of days to look back +6. Click **Export** to download the log as a **CSV file** + +You can open the CSV in any spreadsheet tool to review and filter the activity as needed. + +## Stream Logs to External Services + +In addition to exporting logs as CSV files, you can stream logs in real-time to external log management services for centralized monitoring, analysis, and long-term storage. +### Available Integrations + +- **[Datadog](/cloud/integrations/log-streaming/datadog)** - Stream logs to Datadog for centralized log management, monitoring, and alerting +- **[Splunk](/cloud/integrations/log-streaming/splunk)** - Stream logs to Splunk via HTTP Event Collector (HEC) for centralized log management and analysis +- **[Google Cloud Storage (GCS)](/cloud/integrations/log-streaming/gcs)** - Stream logs to GCS buckets for long-term storage and integration with BigQuery and other Google Cloud services + +Configure log streaming from the Logs page by clicking the **Connect** icon, then select your preferred destination. + + +
+ +
+ + + +
+ +
+ \ No newline at end of file diff --git a/docs/cloud/features/collaboration-and-communication/audit_logs/system-logs.mdx b/docs/cloud/features/collaboration-and-communication/audit_logs/system-logs.mdx new file mode 100644 index 000000000..3bfa25e4d --- /dev/null +++ b/docs/cloud/features/collaboration-and-communication/audit_logs/system-logs.mdx @@ -0,0 +1,45 @@ +--- +title: System Logs +sidebarTitle: System Logs +--- + +System logs track system-level events and operations, including data synchronization events and alert deliveries that help with monitoring system operations and troubleshooting. + +## What's Included + +The system logs capture system-level events and operations across your workspace, including: + +- **Data synchronization events**, including: + - Beginning and end of synchronization with dbt + - Beginning and end of data warehouse (DWH) synchronization + - Beginning and end of business intelligence (BI) synchronization +- **Alert deliveries** - When alerts are sent to configured destinations + +## Fields + +The exported CSV file for system logs includes the following fields: + +- **Timestamp** - The date and time when the event occurred (in UTC, ISO 8601 format) + - *Example:* `"2024-01-15T14:30:45.123456+00:00"` + +- **Event Name** - The specific action that was performed + - *Examples:* `"dbt_data_sync_started"`, `"dbt_data_sync_completed"`, `"dwh_data_sync_started"`, `"dwh_data_sync_completed"`, `"bi_data_sync_started"`, `"bi_data_sync_completed"`, `"alerts_sent"` + - *Example:* `"dbt_data_sync_completed"` or `"alerts_sent"` + +- **Success** - Whether the action completed successfully + - *Values:* `"True"` or `"False"` (as strings in CSV) + - *Example:* `"True"` or `"False"` + +- **Event Content** - Additional context-specific information about the action (stored as a JSON string) + - The contents vary by action type. For example: + - For sync actions: `{"environment_id": "env_789", "environment_name": "Production"}` + - For alert delivery actions: `{"alert_count": 5, "destination": "slack"}` + - *Example:* `{"environment_id": "env_789", "environment_name": "Production"}` + +- **Env ID** - The environment identifier where the action occurred + - *Example:* `"env_7890123456abcdef"` + +- **Env Name** - The name of the environment where the action occurred + - *Example:* `"Production"` or `"Staging"` + +**Note:** System logs do not include user information fields since they represent automated system operations rather than user-initiated actions. diff --git a/docs/cloud/features/collaboration-and-communication/audit_logs/user-activity-logs.mdx b/docs/cloud/features/collaboration-and-communication/audit_logs/user-activity-logs.mdx new file mode 100644 index 000000000..1f0a02db6 --- /dev/null +++ b/docs/cloud/features/collaboration-and-communication/audit_logs/user-activity-logs.mdx @@ -0,0 +1,56 @@ +--- +title: User Activity Logs +sidebarTitle: User Activity Logs +--- + +User activity logs track changes made by users across the system, including actions such as creating, editing, or deleting tests, monitors, metadata, and other configurations. + +## What's Included + +The user activity logs capture a wide range of user actions across your workspace, including: + +- **User login and logout events** +- **Failed login attempts** (when applicable) +- **Creation, modification, or deletion of configuration**, including: + - Adding, editing, or deleting tests + - Adding, editing, deleting, or archiving alert rules + - Acknowledging, resolving, or assigning incidents + - Adding, editing, or deleting metadata (owners, descriptions, tags, critical assets) + - Adding, editing, or deleting integrations + - Sending team member invites + - Changing user roles + +## Fields + +The exported CSV file for user activity logs includes the following fields: + +- **Timestamp** - The date and time when the event occurred (in UTC, ISO 8601 format) + - *Example:* `"2024-01-15T14:30:45.123456+00:00"` + +- **User Email** - The email address of the user who performed the action + - *Example:* `"john.doe@example.com"` + +- **User Name** - The display name of the user who performed the action + - *Example:* `"John Doe"` + +- **Event Name** - The specific action that was performed + - *Examples:* `"user_login"`, `"create_test"`, `"edit_alert_rule"`, `"assign_incident"`, `"create_alert_rule"`, `"edit_metadata"`, `"delete_test"`, `"invite_team_member"`, `"change_user_role"` + - *Example:* `"user_login"` or `"create_test"` + +- **Success** - Whether the action completed successfully + - *Values:* `"True"` or `"False"` (as strings in CSV) + - *Example:* `"True"` or `"False"` + +- **Event Content** - Additional context-specific information about the action (stored as a JSON string) + - The contents vary by action type. For example: + - For incident actions: `{"incident_id": "inc_123", "incident_summary": "Test failure detected"}` + - For test actions: `{"test_id": "test_456", "test_name": "assert_no_null_values"}` + - For alert rule actions: `{"alert_routing_rule_id": "rule_789", "alert_routing_rule": {...}}` + - For metadata actions: `{"asset_id": "asset_123", "metadata_type": "owner"}` + - *Example:* `{"incident_id": "inc_123", "incident_summary": "Test failure detected"}` + +- **Env ID** - The environment identifier where the action occurred. Empty for account-level actions like login/logout that aren't tied to a specific environment. + - *Example:* `"env_7890123456abcdef"` + +- **Env Name** - The name of the environment where the action occurred. Empty for account-level actions like login/logout that aren't tied to a specific environment. + - *Example:* `"Production"` or `"Staging"` \ No newline at end of file diff --git a/docs/cloud/features/collaboration-and-communication/catalog.mdx b/docs/cloud/features/collaboration-and-communication/catalog.mdx new file mode 100644 index 000000000..069e876d2 --- /dev/null +++ b/docs/cloud/features/collaboration-and-communication/catalog.mdx @@ -0,0 +1,57 @@ +--- +title: "Data Catalog" +--- + +The Elementary Cloud catalog is a dynamic workspace for managing data assets. It combines your dbt models, sources, columns, tests, and metrics with metadata, test results, and ownership in one place. + +Teams use the catalog to document their assets, assign owners, monitor test and freshness status, and raise issues or questions. It’s designed for collaboration, so everyone, from data engineers to analysts and business users, can find the context they need and keep shared knowledge up to date. + +From each dataset, you can navigate directly to its lineage and test results, ensuring seamless exploration of data dependencies and quality insights. + + + +The catalog provides a unified view of the assets in your dbt project: +- Models and sources, with associated metadata +- Test results for each asset +- Source freshness information +- Column-level documentation +- Ownership and tagging +- Upstream and downstream dependencies + +You can browse or search the catalog to find a specific asset or get an overview of the current state of your project. + +## Key Features + +### Easily Discoverable Data Assets +Quickly and easily navigate the catalog using the side tree and its search bar. Easily find tables, views, BI dashboard and more. +You can view the data assets by their location in the DWH or in the dbt project, and group them by tags, owners or path. + +Catalog tree + +### Editing Metadata +You can edit metadata directly in the catalog. This includes updating model and column descriptions, assigning owners, and applying tags. +To make updates more efficient, the catalog supports bulk editing. For example, you can assign an owner to multiple models at once or apply a shared tag to a group of assets. +All metadata edits are synced back into your dbt project as a pull request, so documentation stays version-controlled and part of your workflow. + +Need to display additional metadata fields from your dbt `meta` config? Reach out to our team to customize the catalog with fields that best fit your data workflows. + +### Marking Critical Assets +You can mark assets as critical directly in the catalog to highlight their importance. Critical assets are prioritized in monitoring and surfaced in governance views to help teams focus on what matters most. Read more [here](/cloud/features/data-governance/critical_assets). + + +### Lineage & Dependency Export +View upstream and downstream dependencies for each dataset, and export the full dependency list as a CSV for further analysis or documentation. +Lineage export + +### AI Agents for Discovery and Governance +Elementary includes two agents that use the catalog to support your team: +- The [Catalog agent](/cloud/ai-agents/catalog-agent) helps users find relevant models, columns, or metrics using natural language. This is especially helpful for new team members or business users who are less familiar with your dbt structure. +- The [Governance agent](/cloud/ai-agents/governance-agent) identifies missing documentation, unowned assets, or inconsistent tags. It surfaces these issues automatically and can suggest actions to improve coverage. +These agents run continuously and help keep the catalog useful and complete without requiring constant manual work. diff --git a/docs/cloud/features/collaboration-and-communication/data-health.mdx b/docs/cloud/features/collaboration-and-communication/data-health.mdx new file mode 100644 index 000000000..30b044304 --- /dev/null +++ b/docs/cloud/features/collaboration-and-communication/data-health.mdx @@ -0,0 +1,74 @@ +--- +title: Data Health Dashboard +sidebarTitle: Data Health Dashboard +--- + +import DataHealthIntro from '/snippets/cloud/features/data-health/data-health-intro.mdx'; +import DataQualityDimensions from '/snippets/cloud/features/data-health/data-quality-dimensions.mdx'; + + + +### Data Health Dashboard + +The Data Health Dashboard is intended for your data consumers and stakeholders, that want to get a summary of what is happening with the data in your organization. + +It gives a high-level overview that doesn't require deep technical knowledge or going into specific test results. +the dashboard presents the data health in a simple way, by giving a health score, and using a color code to indicate if this score is healthy. +Filters are available at the top of the page, making it easy to see the data health in different contexts. + +Data Health Score + +The dashboard is based on the 6 [Data Quality Dimensions](/cloud/features/collaboration-and-communication/data-quality-dimensions#data-quality-dimensions): + + + +### How is the data health score calculated? + +Each test you run in either dbt or Elementary is mapped to one of these pillars, and given a score. +The scoring method is very simple: +- If the test passes, the score is 100 +- If the test is in `warn` status, the score is 50 +- If the test is in `fail` status, the score is 0 + +The results are aggregated to give a health score for each pillar. +The total score is a weighted average of the 6 pillars, where the weight is configurable. +The thresholds for the color coding (green, yellow and red) are also configurable. + +Score weight and threshold configuration + +### Critical assets alerts score + +To focus on what matters most, see the top right of the screen to filter on your [critical assets](/cloud/features/data-governance/critical_assets). + +### Can I customize the quality dimension mapping of my tests? + +Of course! +Each test you run, whether it's a generic or a custom test, can be mapped to one of the 6 quality dimensions. +The way to do so is to add `quality_dimension` to the test definition in your dbt project: + + + +```yml test +data_tests: + - not_null: + config: + meta: + quality_dimension: completeness +``` + +```yml test/model config block +{{ config( + meta={ + "quality_dimension": "completeness", + } +) }} +``` + + + + +## Coming soon + +- **Send a daily report** of the data health to your stakeholders +- **Compare the data health** of different domains +- **Set up alerts** for when the data health is below a certain threshold diff --git a/docs/cloud/features/collaboration-and-communication/data-observability-dashboard.mdx b/docs/cloud/features/collaboration-and-communication/data-observability-dashboard.mdx new file mode 100644 index 000000000..7a2589e5a --- /dev/null +++ b/docs/cloud/features/collaboration-and-communication/data-observability-dashboard.mdx @@ -0,0 +1,11 @@ +--- +title: Data Observability Dashboard +--- + +Managing data systems can be a complex task, especially when there are hundreds (or even thousands) of models being orchestrated separately across multiple DAGs. These models serve different data consumers, including internal stakeholders, clients, and reverse-ETL pipelines. + +Our Data Observability Dashboard provides an easy-to-use control panel for data teams to monitor the quality and performance of their data warehouse. + + + Elementary Data Observability Dashboard + \ No newline at end of file diff --git a/docs/cloud/features/collaboration-and-communication/data-quality-dimensions.mdx b/docs/cloud/features/collaboration-and-communication/data-quality-dimensions.mdx new file mode 100644 index 000000000..c31a723cd --- /dev/null +++ b/docs/cloud/features/collaboration-and-communication/data-quality-dimensions.mdx @@ -0,0 +1,70 @@ +--- +title: Data Quality Dimensions +sidebarTitle: Data Quality Dimensions +--- + +import DataHealthIntro from '/snippets/cloud/features/data-health/data-health-intro.mdx'; +import DataQualityDimensions from '/snippets/cloud/features/data-health/data-quality-dimensions.mdx'; + + + +## Measuring data quality + + + + +## Data quality dimensions + +The 6 Data Quality Dimensions are: + + + +## Data quality dimensions example + +To help understand different aspects of data quality, let's explore these concepts using a familiar example - the IMDb movie database. +IMDb is a comprehensive database of movies, TV shows, cast members, ratings, and more. +Through this example, we'll see how different data quality issues could affect user experience and data reliability. + + +
+ IMDB banner +
+ + +#### Freshness +- **Definition**: Ensures that data is up to date and reflects the latest information. +- **Example**: Consider The Godfather's IMDb rating. If the rating hasn't been updated since 2000, despite users continuing to submit reviews every year, the displayed rating would be stale. This outdated information could mislead users about the current audience sentiment toward the movie. + +#### Completeness +- **Definition**: Ensures all required fields are filled in, without missing values. +- **Example**: Imagine the IMDb record for Pulp Fiction missing key cast members, such as Uma Thurman. This incomplete data would provide users with an inadequate picture of the movie's legendary cast, significantly reducing the dataset's usefulness. + +#### Uniqueness +- **Definition**: Ensures that each entity is represented only once in the system. +- **Example**: Consider having two separate records for The Matrix with the same primary key but different details - one showing a release year of 1999, another showing 1998. This duplication creates confusion about the correct information and could cause problems in downstream processes, like reporting or website display. + +#### Consistency +- **Definition**: Ensures data remains uniform across multiple datasets and sources. +- **Example**: If IMDb's Top 250 Movies page displays 254 movies due to a backend error, while the Ratings Summary page correctly shows 250 movies, this inconsistency would confuse users and diminish trust in the platform's data. + +#### Validity +- **Definition**: Ensures that data conforms to rules or expectations, such as acceptable ranges or formats. +- **Example**: If a movie's runtime is listed as 1500 minutes when the longest movie ever made was 873 minutes, this would be an invalid value. The runtime clearly doesn't conform to expected movie length ranges and would be considered invalid data. + +#### Accuracy +- **Definition**: Ensures that data represents the real-world scenario correctly. +- **Example**: If an IMDb record listed Leonardo DiCaprio as the director of Inception instead of Christopher Nolan, this would be inaccurate. While DiCaprio starred in the movie, he didn't direct it - this kind of error misrepresents the real-world facts. + + +## Implementation in Elementary + +In Elementary, all the dbt tests and Elementary monitors are automatically attributed to the relvant data quality dimension. +Based on the results of tests and monitors, a data health score is calculated for each dimension, and a total score for the data set. + +The data quality scores are presented in a [data health dashboard](/cloud/features/collaboration-and-communication/data-health#data-health-dashboard), data catalog integrations, and more. + +To learn more, **watch the webinar** [**Measuring Data Health with Elementary**](https://www.elementary-data.com/webinar/measuring-data-health-with-elementary) \ No newline at end of file diff --git a/docs/cloud/features/collaboration-and-communication/saved-views.mdx b/docs/cloud/features/collaboration-and-communication/saved-views.mdx new file mode 100644 index 000000000..bf23fbc41 --- /dev/null +++ b/docs/cloud/features/collaboration-and-communication/saved-views.mdx @@ -0,0 +1,47 @@ +--- +title: "Saved Views" +--- + +Saved Views in Elementary allow you to create and save filter groups that persist across all screens in your workspace. Instead of manually applying the same filters every time you navigate to different pages, you can save your preferred filter combinations and have them automatically applied wherever you go. + +## Why it matters + +- **Consistency** — Your filter preferences stick with you across all screens, from the Catalog to Data Health to Lineage views. +- **Efficiency** — No need to recreate complex filter combinations repeatedly. Set them once and they're always available. +- **Personalization** — Each user can create their own saved views tailored to their specific needs and workflows. +- **Focus** — Quickly switch between different perspectives on your data (e.g., "Marketing Assets", "Production Critical", "My Team's Datasets"). + +## How it works + +Saved Views are built from **filter groups** that you configure in the side tree. The side tree appears on various screens throughout Elementary (Catalog, Data Health, Lineage, etc.) and allows you to filter assets by: + +- **Tags** — Filter by asset tags (e.g., `#marketing`, `#production`) +- **Owners** — Filter by asset owners (e.g., `@data-team`, `@analytics`) +- **Paths** — Filter by dbt project paths or data warehouse locations +- **Other metadata** — Filter by any asset property or metadata field + +Once you've configured your desired filter groups in the side tree, you can save this combination as a Saved View with a descriptive name. The saved view will then be available to you across all screens in your environment. + +## Creating a Saved View + +1. **Configure your filters** in the side tree on any screen (Catalog, Data Health, Lineage, etc.) + - Select the tree view type (e.g., Tags, Owners, Paths) + - Apply your desired filter groups +2. **Save the view** — Click the save icon or "Save View" option in the side tree +3. **Name your view** — Give it a clear, descriptive name (e.g., "Marketing Production Assets", "Critical Data Health View") +4. **Confirm** — Your saved view is now created and will persist across all screens + +## Using Saved Views + +Once created, your saved views appear in the side tree dropdown or menu. You can: + +- **Select a saved view** — Click on any saved view to apply its filter groups to the current screen +- **Switch between views** — Quickly toggle between different saved views to see different perspectives +- **Edit a saved view** — Modify the filter groups and update the saved view +- **Delete a saved view** — Remove saved views you no longer need + +Saved views are **user-specific** and **environment-specific**, meaning: +- Each user has their own set of saved views +- Saved views are scoped to the environment where they were created +- You can have different saved views for different environments (e.g., Production vs. Staging) + diff --git a/docs/cloud/features/config-as-code.mdx b/docs/cloud/features/config-as-code.mdx new file mode 100644 index 000000000..08dacabbe --- /dev/null +++ b/docs/cloud/features/config-as-code.mdx @@ -0,0 +1,49 @@ +--- +title: "Configuration as Code" +--- + +At Elementary, we believe that **code should be the single source of truth.** This is the only way to scale data reliability while keeping everyone aligned: engineers, analysts, and business stakeholders all see the exact same information, directly in their natural workflows. + +Elementary has access to your [code repository](https://docs.elementary-data.com/cloud/integrations/code-repo/connect-code-repo), so every configuration lives in version control. Changes are always tracked, reviewed, and deployed through your existing CI/CD processes—no new workflow required. + + + +### One Experience, Two Interfaces +Every action in Elementary is both: +- **Simple, structured, and clear through the UI** – allowing users to create, edit, or bulk-change tests and metadata with higher efficiency and comfort. +- **Backed by code** – so every change is versioned, reviewed, and integrated seamlessly into your pipeline. + +This keeps your CI process natural, while adding an extra layer of protection: every change can be peer-reviewed through a PR before it goes live. + + +### How Branches & PRs Work +When you make changes through the Elementary UI (such as adding tests, editing assets, or managing configurations), Elementary will: +1. Open a **new feature branch** with the prefix `elementary-` in your connected repository. +2. Commit the changes (e.g., new tests, metadata updates, asset modifications). +3. Automatically create a **pull request** for review with the title format: `{Your Name} via Elementary: {Change Description}`. +From there, your team can review, approve, and merge using the exact same process you already follow for any code change. The system supports various types of changes including test management, asset management, and dbt package upgrades. + + +### Built for All Users +- **Non-technical users** can define rules for business metrics (e.g., *Revenue is never negative*) without writing SQL or relying on engineering help. +- **Technical users** can manage advanced data tests, metadata, and structure at scale, with full transparency and control. + + +### AI-Powered Configurations +Elementary’s [AI Agents](https://docs.elementary-data.com/cloud/features/ai-agents) work the same way: +- You can ask them to recommend tests, optimize queries, or update metadata. +- Once you’re satisfied with the iteration, the agent will automatically open a PR on your behalf—integrating seamlessly into your workflow. + +### Works with MCP +Our [MCP server](https://docs.elementary-data.com/cloud/features/mcp-server) makes working in code even more powerful, bridging between your local dev setup, IDE, and collaborative workflows. + + +### Repo Integrations +Elementary supports native [repository integrations](https://docs.elementary-data.com/cloud/features/integrations) with GitHub, GitLab, and others, so your workflows stay consistent and secure. diff --git a/docs/cloud/features/data-governance/ai-descriptions.mdx b/docs/cloud/features/data-governance/ai-descriptions.mdx new file mode 100644 index 000000000..0bb5083f2 --- /dev/null +++ b/docs/cloud/features/data-governance/ai-descriptions.mdx @@ -0,0 +1,34 @@ +--- +title: "AI-Powered Description Generation (Beta)" +sidebarTitle: "AI descriptions (Beta)" +--- + +Elementary now offers an AI-powered description generation feature that helps you create clear, informative descriptions for your data assets without manual effort. + + + +### Why AI-Generated Descriptions Matter + +- **Improved Discoverability** - Well-described assets are easier to find and understand +- **Consistent Documentation** - Maintain a uniform description style across your data platform +- **Time Savings** - Generate quality descriptions in seconds instead of hours +- **Enhanced Data Context** - Help data consumers quickly understand asset purpose and content +- **Reduced Documentation Gaps** - Easily fill in missing descriptions across multiple assets + +### How to Generate AI Descriptions + +You can generate AI descriptions for individual assets or in bulk: + +1. In the Assets Table, select the assets that need descriptions +2. Click the "Description" button +3. Click "Generate with AI" for all selected assets or click the "Generate with AI" button for a single asset +4. Review the generated descriptions for each asset (please give us a feedback for the AI to improve the quality of the descriptions) +5. Make any necessary adjustments +6. Click "Submit Pull Request" to submit your changes diff --git a/docs/cloud/features/data-governance/critical_assets.mdx b/docs/cloud/features/data-governance/critical_assets.mdx new file mode 100644 index 000000000..3d42da3a8 --- /dev/null +++ b/docs/cloud/features/data-governance/critical_assets.mdx @@ -0,0 +1,61 @@ +--- +title: "Critical Assets" +--- + +### **What is a Critical Asset?** + +A critical asset is any data asset (such as a model, exposure, or report) that plays a crucial role in your company's data ecosystem. Issues affecting these assets can have a significant impact on business operations, dashboards, and decision-making. + +Marking an asset as **critical** ensures it receives higher priority in monitoring and alerting, helping you quickly identify and respond to issues that may impact it. + + + + +## What Should Be Set as a Critical Asset? + +You should mark an asset as **critical** if: + +- It directly impacts key **business reports, dashboards, or decision-making tools**. +- It serves as an essential **upstream dependency** for other important data models. +- It is frequently used by **multiple teams or stakeholders**. +- Its failure or inaccuracy could cause **significant business or operational risks** + +## Why Should I Define My Critical Assets? + +Defining your **critical assets** helps you: + +- **Quickly identify and respond to issues**– Get notified when upstream problems may impact your critical assets, ensuring faster resolution and minimal disruption. +- **Prioritize issue resolution**– Focus on addressing incidents that have the greatest impact on business operations, dashboards, and decision-making. +- **Improve data reliability**– Ensure key stakeholders have access to accurate and up-to-date data by monitoring critical assets more effectively. +- **Enhance observability**– Gain better visibility into the health of your most important assets through prioritized monitoring and alerting. + +## How to Set a Critical Asset? + +You can mark an asset as **critical** directly in the UI: + +- **From the Catalog Page** – Navigate to the asset in the catalog and click the **diamond icon** to **"Set as Critical Asset."** +- **From the Lineage View** – Right-click on the node representing the asset and select **"Set as Critical Asset"** from the list. + +Once an asset is marked as critical, **alerts will now highlight any issues that may impact this asset or its upstream dependencies, ensuring prioritization.** + + + +## Where Can You See Critical Assets? + +Once an asset is marked as **critical**, you will be able to: + +- **Identify it in the UI**, where it will be visually highlighted. +- **Receive alerts** when upstream issues may impact the critical asset. +- **Filter incidents** by their impact on critical assets. +- **Track health scores of critical data assets** over time through dashboard monitoring. + + + +By carefully selecting which assets to mark as critical, you can quickly detect and prioritize issues that impact your most important data, reducing disruptions, improving reliability, and keeping key stakeholders informed. diff --git a/docs/cloud/features/data-governance/manage-metadata.mdx b/docs/cloud/features/data-governance/manage-metadata.mdx new file mode 100644 index 000000000..e5160de80 --- /dev/null +++ b/docs/cloud/features/data-governance/manage-metadata.mdx @@ -0,0 +1,78 @@ +--- +title: "Manage metadata" +--- + +Manage your metadata directly from the [Elementary Catalog](/cloud/features/collaboration-and-communication/catalog), with no need to update code manually. This streamlines governance workflows, reduces manual effort, and improves visibility and consistency across your assets. + +From a single interface, you can view and filter assets, edit metadata fields individually or in bulk, edit metadata for related assets and even use [AI to generate high-quality descriptions](/cloud/features/data-governance/ai-descriptions)—saving time and enhancing data discoverability. All changes are version-controlled via pull requests to keep your dbt project in sync. + +To get the most out of this feature, check out our [Data Governance Best Practices](/cloud/best-practices/governance-for-observability) guide. + + + +## Overview + +Managing metadata such as owners, tags, descriptions, and critical asset status is essential for effective data governance. With Elementary you can: + +- View all your assets and their metadata in a centralized table +- Identify governance gaps and inconsistencies +- Edit metadata fields individually or in bulk +- Generate high-quality asset descriptions using AI +- Create pull requests for metadata changes directly from the UI +- Track pending changes awaiting approval + +### Supported Metadata Fields + +Elementary supports editing the following metadata fields: + +- **Tags** - Categorize and organize assets +- **Owners** - Assign responsibility for assets +- **Descriptions** - Provide context about asset purpose and usage +- **Critical Assets** - Mark high-priority assets + +Coming soon: + +- **Subscribers** - Add users who should be notified about changes +- **Custom Metadata Fields** - Edit organization-specific metadata + +## Editing Metadata + +Elementary allows you to edit metadata for individual assets or in bulk, with changes reflected in your dbt project through pull requests. + +### How to Edit Metadata + +To edit metadata: + +1. In the Catalog page select any folder for getting into the assets Table +2. Select the assets you want to modify +3. Choose which metadata field to edit and click on the right field (Tags, Owners, Description, etc.) +4. Make your changes in the editing wizard +5. Review the changes summary +6. Click "Submit Pull Request" to submit your changes + + +### Editing Metadata for Related Assets +From the Dependencies tab in the Catalog, you can efficiently apply metadata to all upstream and downstream assets of a selected asset in a single action. + + + +## Governance Dashboard (Coming soon) + +The Governance Dashboard provides insights into your data governance structure, helping you identify gaps and inconsistencies at a glance. + + + +### Dashboard Features + +- **Tag Distribution Analysis** - Visualize tag coverage across assets and identify areas needing better categorization +- **Ownership Coverage** - Track asset ownership distribution and highlight resources requiring owner assignment +- **Description Status** - Monitor assets with missing or incomplete descriptions to improve documentation +- **Critical Asset Tracking** - Get an overview of your business-critical data assets and their governance status +- **Dynamic Filtering** - Filter the Assets Table in real-time based on dashboard selections and metrics diff --git a/docs/cloud/features/data-lineage/column-level-lineage.mdx b/docs/cloud/features/data-lineage/column-level-lineage.mdx new file mode 100644 index 000000000..3b6479e6e --- /dev/null +++ b/docs/cloud/features/data-lineage/column-level-lineage.mdx @@ -0,0 +1,46 @@ +--- +title: Column-Level Lineage +sidebarTitle: Column level lineage +--- + +The table nodes in Elementary lineage can be expanded to show the columns. When you +select a column, the lineage of that specific column will be highlighted. + +Column-level lineage is useful for answering questions such as: + +* Which downstream columns are actually impacted by a data quality issue? + +* Can we deprecate or rename a column? + +* Will changing this column impact a dashboard? + +### Filter and highlight columns path + +To help navigate graphs with large amount of columns per table, use the `...` menu to the right of the column: + +* **Filter**: Will show a graph of only the selected column and its dependencies. + +* **Highlight**: Will highlight only the selected column and its dependencies. + + + +### Supported BI tools: + + \ No newline at end of file diff --git a/docs/cloud/features/data-lineage/lineage.mdx b/docs/cloud/features/data-lineage/lineage.mdx new file mode 100644 index 000000000..eb7f40831 --- /dev/null +++ b/docs/cloud/features/data-lineage/lineage.mdx @@ -0,0 +1,40 @@ +--- +title: End-to-End Data Lineage +sidebarTitle: Lineage overview +--- + +Elementary offers automated [Column-Level Lineage](/cloud/features/data-lineage/column-level-lineage) functionality, enriched with the latest test and monitors results. +It is built with usability and performance in mind. +The column-level lineage is built from the metadata of your data warehouse, and integrations with [BI tools](/cloud/features/data-lineage/exposures-lineage#automated-lineage-to-the-bi) such as Looker and Tableau. + +Elementary updates your lineage view frequently, ensuring it is always current. +This up-to-date lineage data is essential for supporting several critical workflows, including: + +- **Effective Data Issue Debugging**: Identify and trace data issues back to their sources. +- **Incidents impact analysis**: You could explore which downstream assets are impacted by each data issue. +- **Prioritize data issues**: Prioritize the triage and resolution of issues that are impacting your critical downstream assets. +- **Public assets health**: By selecting an exposure and filtering on upstream nodes, you can see the status of all its upstream datasets. +- **Change impact**: Analyze which exposures will be impacted by a planned change. +- **Unused datasets**: Detect datasets that are not consumed downstrean, and could be removed to reduce costs. + + + +## Node info and test results + +To view additional information in the lineage view, use the `...` menu to the right of the column: + +- **Test results**: Access the table's latest test results in the lineage view. +- **Node info**: See details such as description, owner and tags. If collected, it will include the latest job info. + + +## Job info in lineage + +You can [configure Elementary to collect jobs information](/cloud/guides/collect-job-data) to present in the lineage _Node info_ tab. Job names can also be used to filter the lineage graph. diff --git a/docs/cloud/features/data-tests/custom-sql-tests.mdx b/docs/cloud/features/data-tests/custom-sql-tests.mdx new file mode 100644 index 000000000..73cfcfe34 --- /dev/null +++ b/docs/cloud/features/data-tests/custom-sql-tests.mdx @@ -0,0 +1,69 @@ +--- +title: Custom SQL Tests +sidebarTitle: Custom SQL test +--- + +Custom SQL queries enable you to create tailored tests that align with your specific business needs. +These tests can be executed against any of the tables in your connected data warehouse. + +### When to use custom SQL tests? + +A custom SQL test is easier to write than a new generic dbt test, but it can't be leveraged across different data sets. +On the other hand, writing custom SQL tests enables testing complex custom calculation logic, relationships between many tables and more. + +This is why most Elementary users write custom SQL tests when the behavior to be tested is complex, specific and doesn’t exist in any out of the box test. +A common use case is for data anlysts to add validations as custom SQL tests. + +As non-technical users are often not familiar with dbt, +Elementary has an interface for adding custom SQL test that converts it to a pull request adding a singular dbt test. + +### Adding a custom SQL test + +1. In `Test Configuration` choose `New test` → `custom query test`. +2. Add your query, it can be a regular SQL query on any tables in your environment. The query should only return results if something is wrong, meaning the test will pass on no results and fail on any results. Please be sure to use the full name of the table, including the db and schema. The query will then be validated and formatted, table names will be replaced with dbt model references. This can take a few seconds. + + +
+ +
+ + +3. Add your test configuration: + +- Test name (should be a valid file name- this will use as the name of your Singular test). +- Description (recommended). +- Location - this will be the directory in your dbt project where the test will be stored. +- Severity - Failure or Warning. +- Tested table (optional) - Adding the table the test is checking will link this test to that table, showing it in the table’s lineage, catalog page and more. +- Tags and Owners (Optional) + - Some users use dbt tags to help with scheduling. Creating a “daily”, “hourly”, and having scheduled jobs for those tags can help determine the test’s schedule at the time of creation. + - It is recommended to add a tag that will be used to later route the alert on failure to the right recipient. + +4. Review & Submit. In this stage you’ll be able to see the translated test query and configuration. + + +
+ +
+ + +5. Clicking “Submit” will go on to create a pull request. +6. After the pull requests is merged to production, tests should run on a `dbt test` job. + +### Custom SQL test results + +The results of all custom SQL tests can be found under a `tests` folder in the test results sidebar. +Additionally, if you configured a `tested table`, `tag` or `owner`, the test result will be visible under the relevant path. + +### Alerts on custom SQL tests + +It's recommended to use tags and owners create an [alert rule](/cloud/features/alerts-and-incidents/alert-rules) that will route these tests to the relevant recipient. + +### Scheduling custom SQL tests + +It's a common practice not to want these tests to run as part of your main job, or at the same frequency. +We recommend to use a tag for all these tests, use dbt tags as selectors to exclude from the main job, and run a dedicated test job that includes these tests only in the required frequency. \ No newline at end of file diff --git a/docs/cloud/features/data-tests/data-tests-overview.mdx b/docs/cloud/features/data-tests/data-tests-overview.mdx new file mode 100644 index 000000000..81c33f540 --- /dev/null +++ b/docs/cloud/features/data-tests/data-tests-overview.mdx @@ -0,0 +1,56 @@ +--- +title: Data Tests Overview +sidebarTitle: Overview and configuration +--- + +import DataTestsCards from '/snippets/cloud/features/data-tests/data-tests-cards.mdx'; + + + +Data tests are useful for validating and enforcing explicit expectations on your data. + +Elementary enables data validation and result tracking by leveraging dbt tests and dbt packages such as dbt-utils, dbt-expectations, and Elementary. +This rich ecosystem of tests covers various use cases, and is widely adopted as a standard for data validations. +Any custom dbt generic or singular test you develop will also be included. +Additionally, users can create custom SQL tests in Elementary, and add tests in bulk from the UI. + +The combination of dbt tests, Elementary monitors, custom SQL tests and the rich dbt testing ecosystem provides the ability to achieve wide and comprehensive coverage. + +### Supported data tests + + + +## Test configuration + +One of the design principals in Elementary is that users should manage configuration in code. +This enables to maintain the same workflow for building the pipeline and configuring coverage, it makes observability and governance part of the development cycle, +and provides control in review process and version management. + +However, adding many tests in code is tedious, and configuration in code isn't usable for everyone. + +In Elementary, we designed a flow to incorporate the good of both worlds: + +- **Configuration in code or in UI** - The UI test configuration flow opens pull requests to the code base through the [code repository](/cloud/integrations/code-repo/connect-code-repo) integration. +- **The code is the single source of truth** - As configuration from UI goes to code, the code remains the place where configuration is managed and maintained. + +### Create new tests from the UI + + + +### Edit existing tests from the UI +Tests can be edited from the UI, and the changes will create a pull request in the code repository. +The pull request will be reviewed and merged by the team, and the changes will be applied to the tests after dbt pipeline is run and Elementary is synced. +Simply open the test side panel from the test results / incidents page, and navigate to the Configuration tab. +Then Click "Edit test" and make the necessary changes. + + + +A pull request will be opened in the code repository, and a link to the PR will be provided in the UI. + diff --git a/docs/cloud/features/data-tests/dbt-tests.mdx b/docs/cloud/features/data-tests/dbt-tests.mdx new file mode 100644 index 000000000..00bcc258c --- /dev/null +++ b/docs/cloud/features/data-tests/dbt-tests.mdx @@ -0,0 +1,36 @@ +--- +title: dbt, Packages and Elementary Tests +sidebarTitle: dbt tests +--- + +import BenefitsDbtTests from '/snippets/cloud/features/data-tests/benefits-dbt-tests.mdx'; +import DbtTestHub from '/snippets/cloud/features/data-tests/dbt-test-hub.mdx'; +import TestsCards from '/snippets/data-tests/tests-cards.mdx'; + + + + + +### dbt Test Hub + + + +### Supported dbt tests and packages + +Elementary collects and monitors the results of all dbt tests. + +The following packages are supported in the tests configuration wizard: + +- [dbt expectations](https://github.com/calogica/dbt-expectations) - A dbt package inspired by the Great Expectations package for Python. The intent is to allow dbt users to deploy GE-like tests in their data warehouse directly from dbt. +- [dbt utils](https://github.com/dbt-labs/dbt-utils) - A package by dbt labs that offers useful generic tests. + +Note that you need to import these packages to your dbt project to use them. + +### Elementary dbt package tests + +The Elementary dbt package also provides tests for detection of data quality issues. +Elementary data tests are configured and executed like native tests in your dbt project. + + + + diff --git a/docs/cloud/features/data-tests/schema-validation-test.mdx b/docs/cloud/features/data-tests/schema-validation-test.mdx new file mode 100644 index 000000000..d3591b1cc --- /dev/null +++ b/docs/cloud/features/data-tests/schema-validation-test.mdx @@ -0,0 +1,29 @@ +--- +title: Schema Validation Tests +sidebarTitle: Schema validation +--- + +The Elementary dbt package includes the following schema validation tests: + + + Fails on changes in schema: deleted or added columns, or change of data type + of a column. + + + +Fails if the table schema is different in columns names or column types than a +configured baseline (can be generated with a macro). + + + + Monitors a JSON type column and fails if there are JSON events that don't + match a configured JSON schema (can be generated with a macro). + + + + Monitors changes in your models' columns that break schema for downstream + exposures, such as BI dashboards. + \ No newline at end of file diff --git a/docs/cloud/features/data-tests/test-coverage-screen.mdx b/docs/cloud/features/data-tests/test-coverage-screen.mdx new file mode 100644 index 000000000..63d02a18a --- /dev/null +++ b/docs/cloud/features/data-tests/test-coverage-screen.mdx @@ -0,0 +1,76 @@ +--- +title: Managing Test Coverage +sidebarTitle: Managing test coverage +--- + +The **Test Coverage screen** in the Elementary Cloud [catalog](/cloud/features/collaboration-and-communication/catalog) gives you a full picture of your data quality coverage across assets and dimensions. It’s designed to help you quickly understand where you stand and take action to improve. + + + +
+ Slack alert format +
+ + +## What It Shows + +Each asset is evaluated across **seven data quality dimensions**: + +- **Freshness** +- **Completeness** +- **Uniqueness** +- **Validity** +- **Accuracy** +- **Consistency** +- **Other** + +For each asset, you’ll see: + +- **Which dimensions are covered** by existing tests +- **Where coverage is missing** +- [A **coverage score** between **0–100%**](/cloud/features/data-tests/test-coverage-screen#how-coverage-calculation-works). +- **Links to test results** + +## What You Can Do from This Screen + +From the Test Coverage screen, you can: + +- Filters by asset properties, test name, critical assets, and coverage ranges. +- **Select multiple assets and seamlessly add your missing tests with just a few clicks** +- **Jump directly to any asset in the catalog to review its details** +- Gain insight into gaps by grouping assets across dimensions like domain, pipeline, tag, or owner—making weak spots easy to identify. (Coming soon) +- Export the results to CSV + +## How Coverage Calculation Works + +The **test coverage score** is calculated based on **7 dimensions of data quality**. +The first **6 core dimensions** each contribute **15%** to the total score. The **7th dimension**, labeled **“Other,”** accounts for the remaining **10%** and is primarily used for **business logic tests** that don’t align directly with any specific core dimension. + +### Upcoming Features + +- **Customizable Weights:** + + You’ll soon be able to tailor the weighting of each dimension to align with your organization’s unique priorities and testing standards. + +- **Custom Coverage Rules:** + + Define your own coverage criteria to better identify tables that do not meet your internal standards. This will make it easier to spot gaps and maintain consistent data quality practices. + + + +
+ +
+ + + +## Test Recommendation AI Agent + +Alongside the Test Coverage screen, you can also use our [**Test Recommendation Agent**](https://docs.elementary-data.com/cloud/ai-agents/test-recommendation-agent) to help you improve test coverage. + +Together, the coverage screen and the agent give you visibility and guidance to focus your efforts where they’ll have the most impact. diff --git a/docs/cloud/features/elementary-alerts.mdx b/docs/cloud/features/elementary-alerts.mdx new file mode 100644 index 000000000..e69de29bb diff --git a/docs/cloud/features/gpg-signed-commits.mdx b/docs/cloud/features/gpg-signed-commits.mdx new file mode 100644 index 000000000..099b8c5bc --- /dev/null +++ b/docs/cloud/features/gpg-signed-commits.mdx @@ -0,0 +1,104 @@ +--- +title: "GPG Commit Signing" +sidebarTitle: "GPG commit signing" +--- + +## Overview + +GPG commit signing provides cryptographic verification that commits were made by you, ensuring the integrity and authenticity of your code changes. When enabled, Elementary will automatically sign commits with your GPG key, providing an additional layer of security and trust. + + + **Enhanced Security:** GPG-signed commits provide cryptographic proof of + authorship and commit integrity. + + +## How it works + +Elementary's GPG commit signing feature allows you to: + +1. **Generate GPG keys** directly in the Elementary interface +2. **Automatically sign commits** when creating pull requests or making changes +3. **Manage your keys** with options to view, revoke, or generate new keys +4. **Fallback gracefully** to unsigned commits if GPG signing fails + +### Key benefits + +- **Cryptographic verification** of commit authorship +- **Tamper detection** - any modification to signed commits will be detected +- **Enhanced security** for your code repository +- **Seamless integration** with your existing workflow + +## Setting up GPG commit signing + +1. Navigate to **User Settings** > **GPG Keys** in your Elementary account +2. Click **"Add GPG Key"** to generate a new key +3. Configure your key settings +4. Click **"Generate Key"** to create your GPG key +5. Copy the public key +6. Go to **GitHub** → **Settings** → **SSH and GPG keys** +7. Click **"New GPG key"** and paste the public key + + + You can only have one active GPG key at a time. To use a different key, you'll + need to revoke the current one first. + + +### View your GPG key details + +Once generated, you can **view the public key** - useful for adding to GitHub or other Git hosting services. + + +
+ View keys in Elementary UI +
+ + + +### Revoke a GPG key + +If you need to revoke your current GPG key: + +1. Go to **User Settings** > **GPG Keys** +2. Find your active key in the table +3. Click the **trash icon** to revoke the key +4. Confirm the revocation + + + **Permanent action:** Revoking a GPG key is permanent and cannot be undone. + You'll need to generate a new key to continue using GPG signing. + + +## How commits are signed + +When GPG commit signing is enabled, Elementary will: + +1. Automatically detect if you have an active GPG key +2. Use your key to sign commits when creating pull requests or making changes +3. Fall back gracefully to unsigned commits if signing fails + +### Commit signature verification + +Signed commits will show a "Verified" badge in GitHub and other Git hosting services, indicating that: + +- The commit was signed with a valid GPG key +- The signature matches the commit content +- The key belongs to the commit author + +## Best practices + +### Key management + +- Use strong key lengths (4096 bits recommended) +- Set expiration dates (90 days recommended) +- Revoke compromised keys immediately + +### Troubleshooting + +If commits aren't being signed: + +1. Check you have an active GPG key in your Elementary settings +2. Verify the key hasn't expired +3. Make sure you added the public key to GitHub diff --git a/docs/cloud/features/integrations.mdx b/docs/cloud/features/integrations.mdx new file mode 100644 index 000000000..4bf43e3cf --- /dev/null +++ b/docs/cloud/features/integrations.mdx @@ -0,0 +1,11 @@ +--- +title: "Elementary integrations" +sidebarTitle: "Integrations" +icon: "plug" +--- + +import CloudIntegrationsCards from '/snippets/cloud/integrations/cards-groups/cloud-integrations-cards.mdx'; + + + + \ No newline at end of file diff --git a/docs/cloud/features/multi-env.mdx b/docs/cloud/features/multi-env.mdx new file mode 100644 index 000000000..5f0f04ae1 --- /dev/null +++ b/docs/cloud/features/multi-env.mdx @@ -0,0 +1,71 @@ +--- +title: "Multiple Environments" +--- + +Elementary supports two approaches for managing multiple environments: + +## Separate Environments + +An environment in Elementary is a combination of a dbt project and a target. +For example: If you have a single dbt project with three targets, prod, staging and dev, you can create 3 environments in Elementary and monitor these environments separately. + +If you have several dbt projects and even different data warehouses, Elementary enables monitoring the data quality of all these environments in a single interface. Simply click on "Create environment" and set up all integration connections. Each environment is managed independently with its own monitoring, alerts, and configuration. + + +
+ +
+ + +## Multi-Environment Support + +Elementary now supports connecting multiple dbt projects to a single Elementary environment. This gives teams the flexibility to organize their dbt code by layers, domains, or separate repositories while still getting one unified view of data health, lineage, and reliability. + +### How it Works + +You can connect several dbt projects, each with its own Elementary schema and code repository, into the same Elementary environment. Once connected, Elementary combines lineage and context across all projects so you have a complete picture of how data flows through your entire stack. + +### Key capabilities + +• **Unified lineage across projects** +Elementary identifies assets by their full relation name, so lineage can span multiple dbt projects with complete accuracy. + +• **No changes to your dbt structure** +Each dbt project keeps its own models, sources, repository, and Elementary schema. Elementary handles the unification at the environment level. + +• **Automatic linking between sources and models** +If the same relation appears as a model in one project and a source in another, Elementary displays two nodes in lineage with a clear link between them. This makes dependencies explicit and helps you trace issues further upstream. + +### When to Use Multi-Environment Support + +Multi-environment support is designed for teams that organize dbt work in multiple projects, such as: + +• Layered architectures (bronze, silver, gold) + +• Domain-based modeling (finance, marketing, operations and others) + +• Multiple dbt projects built on the same raw data + +• Pipelines stored in separate repositories + +This feature lets you maintain the separation between projects while still using Elementary as one consistent reliability control plane. + +### Setup and Configuration + + +Each dbt project added to the environment must use a different Elementary schema, the Elementary dbt package must be installed in each project. + + +Multi-environment support is done in the environment settings, directly in the UI. +After configuring the DWH connection, click "Continue" and you will get to the environment content section, where you can add additional dbt projects to the same environment. + + + +
+ +
+ diff --git a/docs/cloud/features/performance-monitoring/performance-alerts.mdx b/docs/cloud/features/performance-monitoring/performance-alerts.mdx new file mode 100644 index 000000000..765d364f3 --- /dev/null +++ b/docs/cloud/features/performance-monitoring/performance-alerts.mdx @@ -0,0 +1,8 @@ +--- +title: Performance Alerts +sidebarTitle: Performance alerts +--- + +import PerformanceAlerts from '/snippets/guides/performance-alerts.mdx'; + + diff --git a/docs/cloud/features/performance-monitoring/performance-monitoring.mdx b/docs/cloud/features/performance-monitoring/performance-monitoring.mdx new file mode 100644 index 000000000..77ce069ba --- /dev/null +++ b/docs/cloud/features/performance-monitoring/performance-monitoring.mdx @@ -0,0 +1,39 @@ +--- +title: Performance Monitoring +sidebarTitle: Performance monitoring +--- + +Monitoring the performance of your data pipeline is critical for maintaining data quality, reliability, and operational efficiency. +Proactively monitoring performance issues enables to detect bottlenecks and opportunities for optimization, prevent data delays, and avoid unnecessary costs. + +Elementary monitors and logs the execution times of: +- dbt models +- dbt tests + +## Models performance + +Navigate to the `Model Duration` tab. + +The table displays the latest execution time, median execution time, and execution time trend for each model. You can sort the table by these metrics and explore the execution times over time for the models with the longest durations + +It is also useful to use the navigation bar to filter the results, and see run times per tag/owner/folder. + + + +## Tests performance + +Navigate to the `Test Execution History` tab. + +On the table you can see the median execution time and fail rate per test. +You can sort the table by this time column, and detect tests that are compute heavy. + +It is also useful to use the navigation bar to filter the results, and see run times per tag/owner/folder. \ No newline at end of file diff --git a/docs/cloud/features/roles-and-permissions.mdx b/docs/cloud/features/roles-and-permissions.mdx new file mode 100644 index 000000000..90d7dd14c --- /dev/null +++ b/docs/cloud/features/roles-and-permissions.mdx @@ -0,0 +1,165 @@ +--- +title: "Roles & Permissions Management" +--- + +Roles & Permissions management in Elementary lets you control **who can see or do what** inside your account. + +Instead of everyone having the same access, you can assign **roles** to users or groups so each person sees only what's relevant to them, and has only the permissions they need. + +## Why it matters + +- **Security & least privilege** — Users only get access to data and actions required for their role. +- **Team/group separation** — You can separate responsibilities (e.g. analysts, data engineers, reviewers) without interference. +- **Compliance readiness** — With clearly defined roles, you get better traceability for audits and compliance reviews. +- **Flexibility** — permissions are configurable per account, environment, and even per asset/test grouping, so you can support complex org structures. + +## How it works in Elementary + +| Level | What it controls | Typical use cases | +| --- | --- | --- | +| **Account-level permissions** | **Global account configurations** (e.g. manage users, setup integrations, export audit logs) | Admins, Security / Compliance officers | +| **Environment-level permissions** | Access to **specific environments** (e.g. Prod, Staging) + Access to **environment pages and features** (catalog, lineage, PRs, etc.) | Data teams, Analytics engineers, Data governance | +| **Resources - Asset/Test-level permissions** | Access to **specific group of assets or tests** (e.g. only Marketing datasets, or only domain specific DQ rules) | Scoped analysts / data stewards or domain-specific business users | + +A user's effective permissions are the sum (union) of all roles and permissions granted to them. + +## Types of Roles + +Roles in Elementary can be divided into two categories: built-in (Admin, editor, viewer) and custom. + +![image.png](https://res.cloudinary.com/do5hrgokq/image/upload/v1768402091/image_38_grzlqn.png) + +### **Built-in Roles** + +Elementary includes default roles most teams can use immediately - Admin, Editor, Viewer: +| Roles/ Permissions Level | **Account-level permissions** | **Environment-level permissions** | **Resources- Asset/Test-level permissions** | +| --- | --- | --- | --- | +| **Admin** | **Full (Can edit) access for account configurations** (e.g. manage users, setup integrations, export audit logs) | **Full (can edit) access for all environments, pages and features:** Can view and manage any configuration within the environments such as alert rules, manage incidents and PRs, etc. | **Full (can edit) access to all assets and tests:** can create/edit assets, run DQ (data-quality) rules, manage pipelines, etc | +| **Editor** | No permissions for account configurations | **Full (can edit) access for all environments, pages and features:** Can view and manage any configuration within the environments such as alert rules, manage incidents and PRs, etc. | **Full (can edit) access to all assets and tests:** can create/edit assets, run DQ (data-quality) rules, manage pipelines, etc +| **Viewer** | No permissions for account configurations | **Read only access for all environments:** can view all environments and pages. can’t make any changes through the UI. | **Read-only access across data assets and tests:** can browse data assets, lineage, assets’ data health, etc. | + +These roles cover most organizations, but you can extend them with custom roles. + +**Custom Roles** + +For each custom rule you can set limited permissions to a subset of environments, pages, or assets/tests (e.g. only Marketing data, or only Prod environments). It's great for domain-specific teams and business users, ensuring each team only has access to the environments, pages, assets and tests relevant to them. +For each role you need to define role Scope (account-level, environment-level, or asset-level) and permissions type (Read only, Can edit, No permissions). + +Each role may include permissions at one, two, or all three levels: + +1. **Account-Level** + + Decide whether the role can **view** and whether it can **edit** account-wide configurations such as user management, integrations, and global configurations. + + Without at least **view** permissions, account configurations won't appear for this role. + +2. **Environment-Level** + + By default, any role has access to **all environments**, but you can restrict it to specific ones. + Choose which environments the role applies to, and what the user can do there. + + For each environment: + + - **Feature access** — Select which pages (Catalog, Lineage, Alerts, etc.) and features (View/create PRs, AI Agents etc.) are visible. + - **Permission type** — Set it to **read-only,** **edit** or **No permissions** +3. **Resources filters - Assets & Tests (optional)** + + Use metadata filters to restrict the role to specific datasets or tests. + + - **Filters** — e.g., `Asset tag = #marketing`, `Test Owner = @data-team`. + - **Permission type** — Choose read-only or edit access (e.g., create tests, update metadata, modify configurations) for each filter group. + - **"Resources are by default"** — This controls access to resources that **don't match any filter**. + - Set to **Hidden** if you want users to only see the filtered resources (most common for scoped roles). + - Set to **Read-only** if you want unfiltered resources to still be visible (just not editable). + + + If you add filters but leave "Resources are by default" set to **Read-only**, users will still see all resources outside the filter. To truly restrict visibility, set this to **Hidden**. + + +**Custom role example: Finance Prod Editor** + +**Description** - Can edit finance assets, cannot see PII assets + +- Account level - No permissions to view/edit account configuration +- Environment level - + - For 'Prod' environment - Can edit + - Other environments - Hidden (No access) +- Assets/tests level - + - `Asset tag = #finance OR Asset tag = #finance-data-product` - Can edit + - `Asset tag = #PII` - Hidden (No access) + - All other assets → **Hidden** (set via "Resources are by default: Hidden") + +![image.png](https://res.cloudinary.com/do5hrgokq/image/upload/v1768402089/image_39_puhpqm.png) + +## Getting started: How to set up roles and permissions + +1. **Plan your organizational structure** + - List out personas in your org (e.g. "Data Engineer", "Analyst", "Marketing Analyst", "Business users", "DevOps") + - For each persona, decide: + - Whether they need access to **account-level configuration** + - **Which environments** they should be able to access (e.g., Dev, Staging, Prod) + - **Which pages, features, and capabilities** they should be able to use (Catalog, Lineage, Alerts, manage tests, update metadata, etc.) + - **Whether their access should be limited to specific assets or datasets**, using filters such as `Asset tag = #PII` or `Test Owner = @data-team` +2. **Create Custom Roles in Elementary (optional)** + 1. Go to *Settings → Roles,* Click **"Create New Role"** + 2. Set the Role Details - Role name - Give it a clear, descriptive name, such as *Marketing Analyst, Finance Viewer, Data Steward – Production* + 3. Choose the Role Scope (account-level, environment-level, or asset-level) and permissions type (Read only, can edit) + + **Each role may include permissions at one, two, or all three levels.** + +3. **Assign Users or Groups to built in or custom Roles** + - Add individual users or entire groups/teams (via SSO/SCIM integration or manual). + - Users can have multiple roles, permissions are combined (union). +4. **Adjust and Evolve Roles over time** + - As your org grows or changes, update role definitions or add new roles (e.g. "IT team", "Business viewer"). + - Reassign or remove roles when people change responsibilities or leave. + +## Best Practices & Recommendations + +- **Start with broad roles, then narrow down** + + Begin with general roles (Admin, Data Engineer, Analyst), then create more scoped roles once patterns emerge (e.g. separate "Marketing Data Viewer," "Finance Data Viewer"). + +- **Use principle of least privilege** + + Assign only the minimal permissions required. For example, an analyst who only needs to view data should not have edit or delete permissions. + +- **Leverage asset/tag-based filters** + + Use tags like `tag:#marketing`, `env:prod`, to automatically filter access. This lowers maintenance overhead as you add more assets. + +- **Regularly review role assignments** + + Periodically (e.g. quarterly) run the access snapshot and verify that users still need their assigned roles. + +- **Avoid giving "Admin" widely** + + Reserve account-level Admin or environment-wide write permissions for a small set of trusted users (e.g. data platform or security team). + + +## FAQ + +**Q: What happens if a user has two roles with conflicting permissions (e.g. one role gives "read-only", another gives "edit")?** + +A: Elementary uses a **union model** — the user gets the higher privilege. So in that case, they'd have "edit" access. + +**Q: Can I let someone see only a subset of datasets (e.g. only marketing data)?** + +A: Yes. When you create/edit a role, you can specify asset filters (e.g. by tag, owner, or dataset name). That way, scoped viewers or editors only see their allowed subset. + +**Q: What if I need a temporary elevated permission for a user (e.g. for a one-time task)?** + +A: You can assign a role temporarily (for example, a "Contractor – Write Access" role), and remove it once the task is done. We recommend logging the change and reviewing it later. + +**Q: Does using Roles & Permissions affect performance or data loading?** + +A: No — permissions are enforced efficiently at request time. There is no perceptible performance overhead under normal usage. + +**Q: Can I export or audit roles and permissions?** + +A: Yes — Elementary provides **Roles Audit Logs** that record all role changes and assignments. You can export to CSV or share with compliance teams. + +**Q: I created a role with a resource filter, but the user can still see all assets — why?** + +A: Check the **"Resources are by default"** setting in your role's Resource permissions. If it's set to **Read-only**, any resource that doesn't match your filter is still visible with read access. To hide everything outside your filter, change this to **Hidden**. + diff --git a/docs/cloud/general/ai-privacy-policy.mdx b/docs/cloud/general/ai-privacy-policy.mdx new file mode 100644 index 000000000..15d62835b --- /dev/null +++ b/docs/cloud/general/ai-privacy-policy.mdx @@ -0,0 +1,52 @@ +--- +title: "AI Privacy Policy" +icon: "user-lock" +--- + +**We are committed to maintaining the highest standards of data privacy and protection. This AI Privacy Policy outlines how we handle data in the context of our AI-powered features, including our [suite of AI agents](/cloud/ai-agents).** + +Elementary’s AI features are designed to enhance the user experience across key workflows, enabling natural language responses, automated data exploration, intelligent issue triage and resolution, proactive test and governance recommendations, and query optimization. These features are strictly opt-in and must be explicitly enabled for each customer instance. + +### Third-Party AI Providers + +Elementary’s AI features are powered by leading large language models hosted through **Amazon Bedrock**, a secure AWS-managed environment. This architecture ensures that: + +- **All AI processing stays entirely within the AWS infrastructure.** +- **No data is sent to external model providers directly by Elementary.** +- **No data used for AI inference is stored or used for training by any provider.** + +Amazon Bedrock may access models from third-party providers such as **Anthropic** and **OpenAI**. However, these models are accessed **via Bedrock** - your data remains within AWS at all times. + +### Data Handling & Anonymization + +Elementary is committed to minimizing data exposure and ensuring user privacy in all AI-powered workflows. When using AI features, only the minimal context necessary to fulfill the requested functionality is processed, such as metadata, anonymized prompts, or structural test information. + +No personally identifiable information (PII) or customer-specific identifiers are shared with AI models unless explicitly configured by the customer for a particular feature. Most features are designed to operate entirely on anonymized metadata and structured inputs. + +All AI processing takes place within **Amazon Bedrock**, a secure environment within AWS. This ensures that your data remains fully contained within your cloud region and never leaves the AWS infrastructure. + +### Customer Instance Isolation + +AI interactions are scoped strictly to each customer’s instance. No data is shared between tenants. Responses are routed back only to the initiating instance. Our infrastructure uses per-instance storage and 2FA-protected admin access. + +### Data Retention and Deletion + +- No AI request data is stored beyond the scope of processing. +- General client data is retained for a maximum of one week, or deleted immediately upon account closure. + +### Opt-Out Controls + +AI functionality can be completely disabled at the instance level. Customers may also selectively enable only the features they wish to use. + +### Compliance + +Our AI processing aligns with: + +- **GDPR**, **UK GDPR**, **CCPA**, and other global data privacy regulations. +- Security best practices, including those outlined in our [Data Protection Agreement](https://www.elementary-data.com/privacy). + +## Have any more questions? + +We would be happy to answer! +Reach out to us on [email](mailto:legal@elementary-data.com) or [Slack](https://elementary-data.com/community). + diff --git a/docs/cloud/general/security-and-privacy.mdx b/docs/cloud/general/security-and-privacy.mdx index c665454bd..4eac21fed 100644 --- a/docs/cloud/general/security-and-privacy.mdx +++ b/docs/cloud/general/security-and-privacy.mdx @@ -4,6 +4,10 @@ sidebarTitle: "Security and privacy" icon: "lock" --- +import HowItWorks from '/snippets/cloud/how-it-works.mdx'; + + + ## Security highlights Our product is designed with security and privacy in mind. @@ -15,7 +19,7 @@ Our product is designed with security and privacy in mind. - Elementary offers deployment options with no direct access from Elementary Cloud to your data warehouse and third-party tools. - **SOC 2 certification:** Elementary Cloud is SOC2 type II certified! + Elementary Cloud is SOC2 type II certified and HIPAA compliant! @@ -34,18 +38,7 @@ Our product and architecture are always evolving, but our commitment to secure d ## How it works? -1. You install the Elementary dbt package in your dbt project and configure it to write to it's own schema, the Elementary schema. -2. The package writes test results, run results, logs and metadata to the Elementary schema. -3. The cloud service only requires `read access` to the Elementary schema, not to schemas where your sensitive data is stored. -4. The cloud service connects to sync the Elementary schema using an **encrypted connection** and a **static IP address** that you will need to add to your allowlist. - - - Elementary cloud security - + ## What information does Elementary collect? @@ -54,9 +47,9 @@ The collected information is detailed in the table below. You can see all the data that Elementary collects and stores in your local Elementary schema. -In general, Elementary does not collect any raw data. The only exception is the failed rows sample (stored in table `test_results_samples`) which can be disabled. +In general, Elementary does not collect any raw data. The only exception is the failed rows sample (stored in table `test_result_rows`) which can be disabled. This is an opt-out feature that shows a sample of a few raw failed rows for failed tests, to help users triage and understand the problem. -To avoid this sampling, set the var `test_sample_row_count: 0` in your `dbt_project.yml` (default is 5 sample rows). +To avoid this sampling, set the var `test_sample_row_count: 0` in your `dbt_project.yml` (default is 5 sample rows). You can also disable samples for specific tests, protect PII-tagged tables, and request environment-level controls. See [Test Result Samples](/data-tests/test-result-samples) for all available options. | Information | Details | Usage | | ------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | @@ -89,6 +82,21 @@ For more information, see our [Data Protection Agreement](https://www.elementary Contact us at `legal@elementary-data.com` if you need us to review and sign your company's DPA and MNDA. +## Samples storage in the US + +As mentioned above, Elementary generally avoids storing raw data and relies on high level metrics and metadata. +That being said, one feature (that is opt-out) that may contain such data is storage of sample rows for failed tests in your pipeline. + +For customers that require it, Elementary has the option to store such information in the US, instead of the EU which is the default. +In this case, the data will specifically be stored in S3 buckets in the us-east-1 (Virginia) AWS region, instead of eu-central-1 (Frankfurt). + +The relevant datasets that will be stored in the US are: +1. The S3 bucket that contains the samples served by the Elementary UI. +2. The S3 bucket used for AI chat history - as these may potentially interact with sample data. +3. Intermediate S3 buckets used for Elementary's internal data pipeline, that may temporarily store samples. + +This mechanism ensures, if enabled, that sample data and related datasets will never be persisted outside of the US, permanently or temporarily. + ## Have more questions? We would be happy to answer! diff --git a/docs/cloud/guides/alerts-configuration.mdx b/docs/cloud/guides/alerts-configuration.mdx index b62328a00..77e0d4b69 100644 --- a/docs/cloud/guides/alerts-configuration.mdx +++ b/docs/cloud/guides/alerts-configuration.mdx @@ -2,4 +2,8 @@ title: "Alerts code configuration" --- - +import AlertsCodeConfiguration from '/snippets/guides/alerts-code-configuration.mdx'; + + + + diff --git a/docs/cloud/guides/collect-job-data.mdx b/docs/cloud/guides/collect-job-data.mdx index cfd3749bc..c1c156f25 100644 --- a/docs/cloud/guides/collect-job-data.mdx +++ b/docs/cloud/guides/collect-job-data.mdx @@ -3,4 +3,8 @@ title: "Collect Jobs Info From Orchestrator" sidebarTitle: "Collect jobs data" --- - +import CollectJobData from '/snippets/guides/collect-job-data.mdx'; + + + + diff --git a/docs/cloud/guides/collect-source-freshness.mdx b/docs/cloud/guides/collect-source-freshness.mdx index 352a3bce1..977ca211e 100644 --- a/docs/cloud/guides/collect-source-freshness.mdx +++ b/docs/cloud/guides/collect-source-freshness.mdx @@ -3,7 +3,11 @@ title: "Collect dbt source freshness results" sidebarTitle: "dbt source freshness" --- - +import DbtSourceFreshness from '/snippets/guides/dbt-source-freshness.mdx'; + + + + ## Need help? diff --git a/docs/cloud/guides/dev-prod-configuration.mdx b/docs/cloud/guides/dev-prod-configuration.mdx new file mode 100644 index 000000000..cfb9931e8 --- /dev/null +++ b/docs/cloud/guides/dev-prod-configuration.mdx @@ -0,0 +1,89 @@ +--- +title: "Elementary for Development and Production" +sidebarTitle: "Dev/Prod configuration" +--- + +When working with Elementary Cloud, you'll want to configure Elementary differently based on your environment. This guide explains best practices for setting up Elementary across local development, dev/staging environments, and production. + +## Overview + +Elementary's `on-run-end` hooks collect test results and dbt artifacts, which are essential for monitoring but can slow down local development. Here's the recommended approach: + +- **Local development**: Disable hooks to speed up your workflow +- **Dev/Staging environments**: Enable monitoring to catch issues early +- **Production**: Always enabled for full observability + +## Disable Hooks in Local Development + +When developing locally, the `on-run-end` hooks that upload run results and dbt artifacts can slow down your dbt runs. It's recommended to disable these hooks in your local environment. + +You can disable specific hooks using variables in your `dbt_project.yml`: + +```yaml dbt_project.yml +vars: + disable_run_results: "{{ target.name not in ['prod','staging','dev'] }}" + disable_tests_results: "{{ target.name not in ['prod','staging','dev'] }}" + disable_dbt_artifacts_autoupload: "{{ target.name not in ['prod','staging','dev'] }}" + disable_dbt_invocation_autoupload: "{{ target.name not in ['prod','staging','dev'] }}" +``` + +This configuration ensures that: +- Hooks are **disabled** for local development (when `target.name` is not `prod`, `staging`, or `dev`) +- Hooks are **enabled** for your dev, staging, and production environments + +## Monitor Dev and Staging Environments + +While you should disable hooks in local development, you should **enable monitoring** for any dev or staging environments where you run tests. This allows you to: + +- Catch data quality issues before they reach production +- Validate test coverage and test results in non-production environments +- Get early visibility into data problems + +To ensure monitoring is enabled for your dev/staging environments: + +1. **Configure your dbt targets** - Make sure your dev and staging targets are named consistently (e.g., `dev`, `staging`) +2. **Update the disablement vars** - Include your dev/staging target names in the condition, as shown in the example above +3. **Set up Elementary Cloud environments** - Create separate environments in Elementary Cloud for each of your dbt targets (dev, staging, prod) + +## Production Configuration + +In production, Elementary should always be enabled to provide full observability. The configuration above ensures hooks are enabled for production by including `prod` in the list of monitored environments. + +## Complete Example Configuration + +Here's a complete example that covers all scenarios: + +```yaml dbt_project.yml +vars: + # Disable hooks for local development, enable for dev/staging/prod + disable_run_results: "{{ target.name not in ['prod','staging','dev'] }}" + disable_tests_results: "{{ target.name not in ['prod','staging','dev'] }}" + disable_dbt_artifacts_autoupload: "{{ target.name not in ['prod','staging','dev'] }}" + disable_dbt_invocation_autoupload: "{{ target.name not in ['prod','staging','dev'] }}" + +models: + elementary: + +schema: "elementary" +``` + +## Alternative: Disable Entire Package + +If you prefer to disable the entire Elementary package in local development (note: this will also disable Elementary tests), you can use: + +```yaml dbt_project.yml +models: + elementary: + +schema: "elementary" + +enabled: "{{ target.name in ['prod','staging','dev'] }}" +``` + +Disabling the entire package will prevent Elementary tests from running in local development. We recommend using the hook disablement vars instead, which allows Elementary tests to run while skipping the artifact collection. + +## Next Steps + +After configuring your environments: + +1. **Set up Elementary Cloud environments** - Create environments in Elementary Cloud for each of your monitored targets (dev, staging, prod) +2. **Configure sync scheduling** - Set up webhook-triggered syncs for real-time updates. See the [Sync Scheduling guide](/cloud/guides/sync-scheduling) for details +3. **Configure alerts** - Set up alerts for your dev and production environments to get notified of data quality issues + diff --git a/docs/cloud/guides/enable-slack-alerts.mdx b/docs/cloud/guides/enable-slack-alerts.mdx index d2e97bcf1..770ebd465 100644 --- a/docs/cloud/guides/enable-slack-alerts.mdx +++ b/docs/cloud/guides/enable-slack-alerts.mdx @@ -2,10 +2,15 @@ title: "Enable alerts" --- +import AlertsIntroduction from '/snippets/alerts/alerts-introduction.mdx'; +import AlertsDestinationCards from '/snippets/cloud/integrations/cards-groups/alerts-destination-cards.mdx'; + + + ## About alerts - + ## Alerts destinations - \ No newline at end of file + \ No newline at end of file diff --git a/docs/cloud/guides/reduce-on-run-end-time.mdx b/docs/cloud/guides/reduce-on-run-end-time.mdx new file mode 100644 index 000000000..84a5be711 --- /dev/null +++ b/docs/cloud/guides/reduce-on-run-end-time.mdx @@ -0,0 +1,21 @@ +--- +title: "Control On-Run-End Hooks" +sidebarTitle: "Control on-run-end time" +--- + +import ReduceOnRunEndTime from '/snippets/guides/reduce-on-run-end-time.mdx'; + + + +For more details on configuring Elementary for different environments, see the [Dev/Prod Configuration guide](/cloud/guides/dev-prod-configuration). + +## Next Steps + +After configuring your on-run-end hooks timing: + +1. **Set up metadata sync schedule**: If you disabled autoupload, create a process (manual or recurring job, e.g., daily) to sync metadata when your project changes +2. **Configure sync scheduling**: Set up webhook-triggered syncs for real-time updates. See the [Sync Scheduling guide](/cloud/guides/sync-scheduling) for details +3. **Monitor data completeness**: Verify that Elementary Cloud still receives the data needed for your monitoring use cases + +For more information about configuring Elementary for different environments, see the [Dev/Prod Configuration guide](/cloud/guides/dev-prod-configuration). + diff --git a/docs/cloud/guides/set-up-elementary-checklist.md b/docs/cloud/guides/set-up-elementary-checklist.md new file mode 100644 index 000000000..8b93cc7d4 --- /dev/null +++ b/docs/cloud/guides/set-up-elementary-checklist.md @@ -0,0 +1,73 @@ +# Set up Elementary + +Building reliable, trustworthy data pipelines shouldn't be painful. Elementary gives you a clear, fast path to understanding your data, monitoring its health, and responding before problems impact your business. + +This guide walks you through everything you need to onboard successfully, from connecting your stack to configuration and setting up your first workflows. + +With Elementary, you'll quickly be able to: + +* Catch issues proactively before they affect consumers +* Maintain reliable and compliant data across your ecosystem +* Establish clear ownership and accountability +* Help everyone discover and understand the data they use + +Let's get you set up. + +## Elementary's Core Principles + +* **Code as the source of truth** [ ] Elementary keeps your code as the system of record. Any change made in the app opens a pull request in your repository, fitting naturally into your CI/CD process and keeping engineers in control. You can work from code, from the UI, or through MCP and everything stays aligned. +* **Scaling reliability with AI** [ ] Elementary uses AI to help you scale your reliability practices. Ella, our network of AI agents, assists with test creation, metadata enrichment, troubleshooting, and optimization so teams can stay proactive and focus on higher-value work. +* **Enabling all users** [ ] Elementary is built for the entire organization. Business users get a clear catalog, ownership information, health scores, and AI assistance so everyone can find, understand, and trust the data without depending on engineering. + +## Technical Setup + +### Setup and integrations + +[ ] [Create Elementary cloud account](/cloud/quickstart) (quick-start guide) +[ ] [Install Elementary dbt package](/cloud/onboarding/quickstart-dbt-package) + Collect dbt artifacts and enable Elementary's built-in tests (anomaly detection, schema change detection). If you already have the package deployed in the relevant environment you want to monitor, make sure it's up to date to the latest version, and upgrade it if not. +[ ] [Connect your data warehouse](/cloud/onboarding/connect-data-warehouse) + Elementary reads metadata (dbt artifacts, information schema, query history) to power test results, run history, automated freshness/volume monitors, and column-level lineage. +[ ] [Invite team members](/cloud/manage-team) +[ ] [Code repository connection (optional)](/cloud/integrations/code-repo/connect-code-repo) + Allow users and AI agents to open PRs and manage changes through your CI/CD process. +[ ] [Integrate with your BI tool (optional)](/cloud/integrations/bi/connect-bi-tool) + Enable column-level lineage and full context for BI assets, including the health of upstream sources and models. +[ ] Connect to your external Catalog (Optional) + Present Elementary data health context in another catalog, we support [Atlan](/cloud/integrations/governance/atlan). +[ ] Non-dbt tables (optional) + Elementary has a Python SDK for python-based transformations. If you need Elementary to monitor non-dbt tables, reach out to the Elementary team with a list of relevant schemas / datasets / databases. This is a beta feature, the team will guide you through the next steps. + +### Alerts & Incidents + +[ ] Connect messaging app ([Slack](/cloud/integrations/alerts/slack) / [MS Teams](/cloud/integrations/alerts/ms-teams)) + Receive alerts where your team already works. +[ ] Connect [incident management tool](/cloud/integrations/alerts) (optional) + We support [Opsgenie](/cloud/integrations/alerts/opsgenie) and [PagerDuty](/cloud/integrations/alerts/pagerduty). +[ ] Connect [ticketing system](/cloud/integrations/alerts) (optional) + We support [Jira](/cloud/integrations/alerts/jira), [ServiceNow](/cloud/integrations/alerts/servicenow) and [Linear](/cloud/integrations/alerts/linear). + +### AI Agents + +[ ] Sign [AI features consent form](https://2cwlc1.share-eu1.hsforms.com/2toUDbhpMRTK0WPmv2az_2g) + Elementary AI uses only metadata, not raw or personal data. LLMs run via Amazon Bedrock with no data sharing to third parties. Learn more [here](/cloud/general/ai-privacy-policy). +[ ] Elementary to open AI Features + +## Advanced Setup + +[ ] [Schedule Elementary syncs](/cloud/guides/sync-scheduling) with your data warehouse. +[ ] Contact the Elementary team to enable [Okta SSO](/cloud/integrations/security-and-connectivity/okta), [AWS PrivateLink](/cloud/integrations/security-and-connectivity/aws-privatelink-integration), or [Microsoft Entra ID](/cloud/integrations/security-and-connectivity/ms-entra). +[ ] [Configuring Elementary for Development and Production](/cloud/guides/dev-prod-configuration) +[ ] [Manage multiple environments](/cloud/features/multi-env) + Use multiple dbt projects and targets (dev, staging, prod) through the multi-env feature. + If you wish to monitor multiple environments in a single environment view (including unified lineage), reach out to the Elementary team. +[ ] [Collect job information from orchestrator](/cloud/guides/collect-job-data) + Elementary can collect metadata about your jobs from the orchestrator you are using, and enrich the Elementary UI with this information. + +## Security and permissions + +[ ] Configure [roles and permissions](/cloud/manage-team) in Elementary (optional). +[ ] Add SSO authentication with [Okta](/cloud/integrations/security-and-connectivity/okta) or [Azure AD](/cloud/integrations/security-and-connectivity/ms-entra) (optional). +[ ] Connect using [AWS private link](/cloud/integrations/security-and-connectivity/aws-privatelink-integration) (optional) +[ ] Export [user activity logs](/cloud/features/collaboration-and-communication/audit_logs/user-activity-logs) (optional). + diff --git a/docs/cloud/guides/set-up-elementary.mdx b/docs/cloud/guides/set-up-elementary.mdx new file mode 100644 index 000000000..cb0733f67 --- /dev/null +++ b/docs/cloud/guides/set-up-elementary.mdx @@ -0,0 +1,80 @@ +--- +title: "Set up Elementary" +sidebarTitle: "Technical Setup" +--- + +Building reliable, trustworthy data pipelines shouldn't be painful. Elementary gives you a clear, fast path to understanding your data, monitoring its health, and responding before problems impact your business. + +This guide walks you through everything you need to onboard successfully, from connecting your stack to configuration and setting up your first workflows. Once you've completed the technical setup, continue with [Start Using Elementary](/cloud/guides/start-using-elementary) to begin using Elementary in practice. + +With Elementary, you'll quickly be able to: + +* Catch issues proactively before they affect consumers +* Maintain reliable and compliant data across your ecosystem +* Establish clear ownership and accountability +* Help everyone discover and understand the data they use + +Let's get you set up. + +## Elementary's Core Principles + +* **Code as the source of truth** - Elementary keeps your code as the system of record. Any change made in the app opens a pull request in your repository, fitting naturally into your CI/CD process and keeping engineers in control. You can work from code, from the UI, or through MCP and everything stays aligned. +* **Scaling reliability with AI** - Elementary uses AI to help you scale your reliability practices. Ella, our network of AI agents, assists with test creation, metadata enrichment, troubleshooting, and optimization so teams can stay proactive and focus on higher-value work. +* **Enabling all users** - Elementary is built for the entire organization. Business users get a clear catalog, ownership information, health scores, and AI assistance so everyone can find, understand, and trust the data without depending on engineering. + +## Technical Setup + + +Here's a link to a checkbox version of this guide that you can download and use to track your progress: [Download as markdown](/cloud/guides/set-up-elementary-checklist.md). + + +### Setup and integrations + +- [Create Elementary cloud account](/cloud/quickstart) (quick-start guide) +- [Install Elementary dbt package](/cloud/onboarding/quickstart-dbt-package) + Collect dbt artifacts and enable Elementary's built-in tests (anomaly detection, schema change detection). If you already have the package deployed in the relevant environment you want to monitor, make sure it's up to date to the latest version, and upgrade it if not. +- [Connect your data warehouse](/cloud/onboarding/connect-data-warehouse) + Elementary reads metadata (dbt artifacts, information schema, query history) to power test results, run history, automated freshness/volume monitors, and column-level lineage. +- [Invite team members](/cloud/manage-team) +- [Code repository connection (optional)](/cloud/integrations/code-repo/connect-code-repo) + Allow users and AI agents to open PRs and manage changes through your CI/CD process. +- [Integrate with your BI tool (optional)](/cloud/integrations/bi/connect-bi-tool) + Enable column-level lineage and full context for BI assets, including the health of upstream sources and models. +- Connect to your external Catalog (Optional) + Present Elementary data health context in another catalog, we support [Atlan](/cloud/integrations/governance/atlan). +- Non-dbt tables (optional) + Elementary has a Python SDK for python-based transformations. If you need Elementary to monitor non-dbt tables, reach out to the Elementary team with a list of relevant schemas / datasets / databases. This is a beta feature, the team will guide you through the next steps. + +### Alerts & Incidents + +- Connect messaging app ([Slack](/cloud/integrations/alerts/slack) / [MS Teams](/cloud/integrations/alerts/ms-teams)) + Receive alerts where your team already works. +- Connect [incident management tool](/cloud/integrations/alerts) (optional) + We support [Opsgenie](/cloud/integrations/alerts/opsgenie) and [PagerDuty](/cloud/integrations/alerts/pagerduty). +- Connect [ticketing system](/cloud/integrations/alerts) (optional) + We support [Jira](/cloud/integrations/alerts/jira), [ServiceNow](/cloud/integrations/alerts/servicenow) and [Linear](/cloud/integrations/alerts/linear). + +### AI Agents + +- Sign [AI features consent form](https://2cwlc1.share-eu1.hsforms.com/2toUDbhpMRTK0WPmv2az_2g) + Elementary AI uses only metadata, not raw or personal data. LLMs run via Amazon Bedrock with no data sharing to third parties. Learn more [here](/cloud/general/ai-privacy-policy). +- Elementary to open AI Features + +## Advanced Setup + +- [Schedule Elementary syncs](/cloud/guides/sync-scheduling) with your data warehouse. +- Contact the Elementary team to enable [Okta SSO](/cloud/integrations/security-and-connectivity/okta), [AWS PrivateLink](/cloud/integrations/security-and-connectivity/aws-privatelink-integration), or [Microsoft Entra ID](/cloud/integrations/security-and-connectivity/ms-entra). +- [Configuring Elementary for Development and Production](/cloud/guides/dev-prod-configuration) +- [Manage multiple environments](/cloud/features/multi-env) + Use multiple dbt projects and targets (dev, staging, prod) through the multi-env feature. + If you wish to monitor multiple environments in a single environment view (including unified lineage), reach out to the Elementary team. +- [Collect job information from orchestrator](/cloud/guides/collect-job-data) + Elementary can collect metadata about your jobs from the orchestrator you are using, and enrich the Elementary UI with this information. + +## Security and permissions + +- Configure [roles and permissions](/cloud/manage-team) in Elementary (optional). +- Add SSO authentication with [Okta](/cloud/integrations/security-and-connectivity/okta) or [Azure AD](/cloud/integrations/security-and-connectivity/ms-entra) (optional). +- Connect using [AWS private link](/cloud/integrations/security-and-connectivity/aws-privatelink-integration) (optional) +- Export [user activity logs](/cloud/features/collaboration-and-communication/audit_logs/user-activity-logs) (optional). + diff --git a/docs/cloud/guides/start-using-elementary-checklist.md b/docs/cloud/guides/start-using-elementary-checklist.md new file mode 100644 index 000000000..30ca4370c --- /dev/null +++ b/docs/cloud/guides/start-using-elementary-checklist.md @@ -0,0 +1,80 @@ +# Start Using Elementary + +With your environment connected (see [Set up Elementary](/cloud/guides/set-up-elementary) for technical onboarding), the next step is to see Elementary in action. + +We'll take a few of your key assets and run them through the core reliability steps: setting clear ownership, adding the right tests, configuring alerts, and understanding how issues are detected and resolved. This gives you a practical, hands-on view of how Elementary keeps your most important data reliable. + +## Identify Critical Assets + +*Goal: Identify business-critical assets in your pipeline and make sure their documentation and ownership are clear.* + +[ ] In the [Catalog](/cloud/features/collaboration-and-communication/catalog), identify and mark three [critical assets](/cloud/features/data-governance/critical_assets) +[ ] Add or update [descriptions, tags, and owners](/cloud/features/data-governance/manage-metadata) for these assets +[ ] Leverage the [Governance AI agent](/cloud/ai-agents/governance-agent) - Complete missing metadata based on project context and according to your instructions. +[ ] View [end-to-end, column-level lineage](/cloud/features/data-lineage/lineage) for each critical asset + +## Setup monitoring on critical assets + +*Goal: Ensure critical assets are covered with the right tests so issues are detected before they affect consumers.* + +[ ] Open the [Test Coverage](/cloud/features/data-tests/test-coverage-screen) view to understand existing coverage by dimension +[ ] [Add missing tests](/cloud/features/data-tests/data-tests-overview#test-configuration) such as freshness, volume, uniqueness, [anomaly detection](/data-tests/how-anomaly-detection-works), [schema changes](/data-tests/schema-tests/schema-changes) or custom logic +[ ] Leverage the [Test Recommendation AI agent](/cloud/ai-agents/test-recommendation-agent) for suggested tests based on patterns and lineage +[ ] For source monitoring: Elementary automatically adds ML-based [freshness](/cloud/features/anomaly-detection/automated-freshness) and [volume](/cloud/features/anomaly-detection/automated-volume) tests to all sources to catch pipeline and ingestion issues early + +## Setup Alerts + +*Goal: Create a routing process that gets the right alerts to the right people while avoiding alert fatigue.* + +[ ] Use [tags, owners, and subscribers](/cloud/features/alerts-and-incidents/owners-and-subscribers) to define who should be notified and who is responsible for action +[ ] Build an internal alert response plan: who fixes the issue, who needs awareness, and what the SLA should be (see the full playbook [here](https://www.elementary-data.com/post/breaking-alert-fatigue-the-enterprise-playbook-for-data-alerts)) +[ ] Create [alert rules](/cloud/features/alerts-and-incidents/alert-rules) that translate this plan into routing, using tags, owners, subscribers, and severity + +## Triage & Resolution + +*Goal: Investigate and resolve issues quickly and confidently.* + +[ ] Triage: Review lineage and upstream failures alongside recent commits and dbt run history to understand what changed and what the issue is impacting +[ ] Use the [Triage & Resolution AI agent](/cloud/ai-agents/triage-resolution-agent) to run the investigation for you by analyzing lineage, failures, recent changes, and dependencies, and surfacing root causes +[ ] Use the [Incidents](/cloud/features/alerts-and-incidents/incident-management) screen to manage issues collaboratively and keep track of everything that's currently open + +## Optimize Performance + +*Goal: Meet your SLAs, improve run times, and ensure your pipelines operate efficiently.* + +[ ] Identify long-running tests and models using the [Performance pages](/cloud/features/performance-monitoring/performance-monitoring) +[ ] Use the [Optimization AI agent](/cloud/ai-agents/performance-cost-agent) to optimize queries by surfacing inefficiencies, suggesting improved SQL patterns, and identifying opportunities to reduce data scans or simplify logic + +## Measure Progress and Data Health + +*Goal: Track improvements in reliability and identify where further attention is needed.* + +[ ] Use the [Data Health](/cloud/features/collaboration-and-communication/data-health) screen to monitor overall health scores for your assets +[ ] Filter the screen by domain, owners, or critical assets to measure progress at a higher resolution and keep accountability clear + +## Enable analysts and business users + +*Goal: Make it easy for anyone to discover assets, understand what they represent, see how they're built, and know whether they can be trusted.* + +[ ] Use the [Catalog](/cloud/features/collaboration-and-communication/catalog) to explore assets, their definitions, ownership, and current health status +[ ] Review [Lineage](/cloud/features/data-lineage/lineage) to see how the asset is built, what depends on it, and whether any upstream issues or test failures affect it +[ ] Use the [Discovery AI agent](/cloud/ai-agents/catalog-agent) to get clear explanations of the asset, how it's calculated, and any current reliability concerns + +## Advanced: Use MCP to Extend Elementary Everywhere + +### For Engineers + +*Goal: Make development safer and faster by bringing full pipeline context into your coding environment.* + +[ ] [Enable Elementary MCP](/cloud/mcp/setup-guide) inside your IDE or coding assistant (Cursor, Claude Code, etc.) +[ ] Add metadata or validations at scale without touching the UI or code manually +[ ] Check lineage, coverage, and asset health while you code so you can spot issues early, understand downstream impact, and prevent problems from reaching production + +### For Analysts and Business Users + +*Goal: Bring Elementary's context into the tools people already use.* + +[ ] [Connect Elementary MCP](/cloud/mcp/setup-guide) to any MCP-enabled client (Claude, ChatGPT, or internal AI agents) +[ ] Allow users to ask about assets, see definitions and ownership, review lineage, and check health or incidents directly through their AI assistant +[ ] (Optional) Connect additional MCPs such as dbt or Atlan so users can navigate across multiple systems from a single conversational interface + diff --git a/docs/cloud/guides/start-using-elementary.mdx b/docs/cloud/guides/start-using-elementary.mdx new file mode 100644 index 000000000..282e235a1 --- /dev/null +++ b/docs/cloud/guides/start-using-elementary.mdx @@ -0,0 +1,87 @@ +--- +title: "Start Using Elementary" +sidebarTitle: "Start using Elementary" +--- + + +Here's a link to a checkbox version of this guide that you can download and use to track your progress: [Download as markdown](/cloud/guides/start-using-elementary-checklist.md). + + +With your environment connected (see [Set up Elementary](/cloud/guides/set-up-elementary) for technical onboarding), the next step is to see Elementary in action. + +We'll take a few of your key assets and run them through the core reliability steps: setting clear ownership, adding the right tests, configuring alerts, and understanding how issues are detected and resolved. This gives you a practical, hands-on view of how Elementary keeps your most important data reliable. + +## Identify Critical Assets + +*Goal: Identify business-critical assets in your pipeline and make sure their documentation and ownership are clear.* + +- In the [Catalog](/cloud/features/collaboration-and-communication/catalog), identify and mark three [critical assets](/cloud/features/data-governance/critical_assets) +- Add or update [descriptions, tags, and owners](/cloud/features/data-governance/manage-metadata) for these assets +- Leverage the [Governance AI agent](/cloud/ai-agents/governance-agent) - Complete missing metadata based on project context and according to your instructions. +- View [end-to-end, column-level lineage](/cloud/features/data-lineage/lineage) for each critical asset + +## Setup monitoring on critical assets + +*Goal: Ensure critical assets are covered with the right tests so issues are detected before they affect consumers.* + +- Open the [Test Coverage](/cloud/features/data-tests/test-coverage-screen) view to understand existing coverage by dimension +- [Add missing tests](/cloud/features/data-tests/data-tests-overview#test-configuration) such as freshness, volume, uniqueness, [anomaly detection](/data-tests/how-anomaly-detection-works), [schema changes](/data-tests/schema-tests/schema-changes) or custom logic +- Leverage the [Test Recommendation AI agent](/cloud/ai-agents/test-recommendation-agent) for suggested tests based on patterns and lineage +- For source monitoring: Elementary automatically adds ML-based [freshness](/cloud/features/anomaly-detection/automated-freshness) and [volume](/cloud/features/anomaly-detection/automated-volume) tests to all sources to catch pipeline and ingestion issues early + +## Setup Alerts + +*Goal: Create a routing process that gets the right alerts to the right people while avoiding alert fatigue.* + +- Use [tags, owners, and subscribers](/cloud/features/alerts-and-incidents/owners-and-subscribers) to define who should be notified and who is responsible for action +- Build an internal alert response plan: who fixes the issue, who needs awareness, and what the SLA should be (see the full playbook [here](https://www.elementary-data.com/post/breaking-alert-fatigue-the-enterprise-playbook-for-data-alerts)) +- Create [alert rules](/cloud/features/alerts-and-incidents/alert-rules) that translate this plan into routing, using tags, owners, subscribers, and severity + +## Triage & Resolution + +*Goal: Investigate and resolve issues quickly and confidently.* + +- Triage: Review lineage and upstream failures alongside recent commits and dbt run history to understand what changed and what the issue is impacting +- Use the [Triage & Resolution AI agent](/cloud/ai-agents/triage-resolution-agent) to run the investigation for you by analyzing lineage, failures, recent changes, and dependencies, and surfacing root causes +- Use the [Incidents](/cloud/features/alerts-and-incidents/incident-management) screen to manage issues collaboratively and keep track of everything that's currently open + +## Optimize Performance + +*Goal: Meet your SLAs, improve run times, and ensure your pipelines operate efficiently.* + +- Identify long-running tests and models using the [Performance pages](/cloud/features/performance-monitoring/performance-monitoring) +- Use the [Optimization AI agent](/cloud/ai-agents/performance-cost-agent) to optimize queries by surfacing inefficiencies, suggesting improved SQL patterns, and identifying opportunities to reduce data scans or simplify logic + +## Measure Progress and Data Health + +*Goal: Track improvements in reliability and identify where further attention is needed.* + +- Use the [Data Health](/cloud/features/collaboration-and-communication/data-health) screen to monitor overall health scores for your assets +- Filter the screen by domain, owners, or critical assets to measure progress at a higher resolution and keep accountability clear + +## Enable analysts and business users + +*Goal: Make it easy for anyone to discover assets, understand what they represent, see how they're built, and know whether they can be trusted.* + +- Use the [Catalog](/cloud/features/collaboration-and-communication/catalog) to explore assets, their definitions, ownership, and current health status +- Review [Lineage](/cloud/features/data-lineage/lineage) to see how the asset is built, what depends on it, and whether any upstream issues or test failures affect it +- Use the [Discovery AI agent](/cloud/ai-agents/catalog-agent) to get clear explanations of the asset, how it's calculated, and any current reliability concerns + +## Advanced: Use MCP to Extend Elementary Everywhere + +### For Engineers + +*Goal: Make development safer and faster by bringing full pipeline context into your coding environment.* + +- [Enable Elementary MCP](/cloud/mcp/setup-guide) inside your IDE or coding assistant (Cursor, Claude Code, etc.) +- Add metadata or validations at scale without touching the UI or code manually +- Check lineage, coverage, and asset health while you code so you can spot issues early, understand downstream impact, and prevent problems from reaching production + +### For Analysts and Business Users + +*Goal: Bring Elementary's context into the tools people already use.* + +- [Connect Elementary MCP](/cloud/mcp/setup-guide) to any MCP-enabled client (Claude, ChatGPT, or internal AI agents) +- Allow users to ask about assets, see definitions and ownership, review lineage, and check health or incidents directly through their AI assistant +- (Optional) Connect additional MCPs such as dbt or Atlan so users can navigate across multiple systems from a single conversational interface + diff --git a/docs/cloud/guides/sync-scheduling.mdx b/docs/cloud/guides/sync-scheduling.mdx index 6e55c36da..cdeeed7b3 100644 --- a/docs/cloud/guides/sync-scheduling.mdx +++ b/docs/cloud/guides/sync-scheduling.mdx @@ -9,7 +9,7 @@ The data on your Elementary Cloud environments is updated by syncing the local E There are 2 available scheduling options: 1. **Hourly syncs** - By default, Elementary Cloud syncs data from your Elementary schema on an hourly basis. This means that within the one-hour time window, your data may not be up-to-date. -2. **Triggered syncs** - Configure a webhook that will trigger a sync after your data pipeline has finished running. This configuration is recommended, as it's as close as possible to real time. +2. **Triggered syncs** - Configure a webhook that will trigger a sync after your data pipeline has finished running. This configuration is recommended, as it's as close as possible to real time. As a backup, Elementary still keeps a sync that runs every 4 hours. ## Triggered syncs using webhook diff --git a/docs/cloud/guides/troubleshoot.mdx b/docs/cloud/guides/troubleshoot.mdx index 3a808c2be..6e9c230ba 100644 --- a/docs/cloud/guides/troubleshoot.mdx +++ b/docs/cloud/guides/troubleshoot.mdx @@ -2,26 +2,26 @@ title: "Troubleshooting" --- -### I connected my data warehouse but I don't see any test results + If you already connected your data warehouse to Elementary but are not seeing anything in the Elementary UI, there could be several reasons. Try following these steps to troubleshoot: - **1. Validate elementary dbt package is deployed, working, and using the latest version** -- Refer to the [dbt package installation guide](/quickstart#install-the-dbt-package), and validate that your version in packages.yml is the one mentioned there. If not, upgrade and run `dbt deps`. Make sure to execute `dbt run --select elementary` for the package tables to be created. +- Refer to the [dbt package installation guide](/cloud/quickstart#install-the-dbt-package), and validate that your version in packages.yml is the one mentioned there. If not, upgrade and run `dbt deps`. Make sure to execute `dbt run --select elementary` for the package tables to be created. **2. Check if the table `elementary_test_results` exists and has data** -- If the table does not exist - refer to the [dbt package installation guide](/quickstart#install-the-dbt-package). Make sure to execute `dbt run --select elementary` for the package tables to be created. +- If the table does not exist - refer to the [dbt package installation guide](/cloud/quickstart#install-the-dbt-package). Make sure to execute `dbt run --select elementary` for the package tables to be created. - If the table exists but has no data - Did you execute `dbt test` since deploying the package and creating the models? - If you have, make sure the table was created as an incremental table (not a regular table or view). - If not, there is a materialization configuration in your `dbt_project.yml` file that overrides the package config. Remove it, and run `dbt run --select elementary --full-refresh` to recreate the tables. After that run `dbt test` again and check if there is data. **4. Still no data in the table? Reach out to the Elementary team by starting an intercom chat from the Elementary UI.** + -### Column information cannot be retrieved + This error can happen because of a few reasons: @@ -40,13 +40,24 @@ For more information on the permissions required by each data warehouse: [Postgres](/cloud/integrations/dwh/postgres#permissions-and-security) + -### How do I set up the table name of my Singular test? + Singular tests are sql queries that can reference more than one table, but are often intended to test a logic that is related to one table in particular. In order to have that table name appear in the UI in the test results, test execution and more screens, you can set it up by adding the following to the config block of your singular test file: ``` {{ config( - override_primary_test_model_id="model_name" + override_primary_test_model_id="model_unique_id" ) }} -``` \ No newline at end of file +``` + +Note: Use the `model_unique_id`, not the model name. +The `model_unique_id` is the unique identifier of the model in dbt, and can be found by running the query: +``` +SELECT unique_id +FROM .dbt_models +WHERE name= +``` + + diff --git a/docs/cloud/integrations/alerts/email.mdx b/docs/cloud/integrations/alerts/email.mdx new file mode 100644 index 000000000..bd6d56bf6 --- /dev/null +++ b/docs/cloud/integrations/alerts/email.mdx @@ -0,0 +1,51 @@ +--- +title: "Email" +--- + +Elementary supports sending alerts directly to email recipients, without requiring environment-level configuration. Email destinations can be added and managed directly within alert rules. + +## Overview + +- Email **does not** need to be configured as an integration on the Environment page. +- Email can be added **directly as a provider** when creating or editing alert rules. +- Users can enter individual emails or paste entire lists. + + +
+ Email alert destination +
+ + +## Creating an Email Alert Destination + +1. Go to **Alert Destinations**. +2. Select **Add new alert provider**. +3. Choose **Email**. + +### Adding recipients + +- Users may type or paste one or more email addresses. +- The system validates email format for each entry. +- When pasting combined fields (e.g., "Name email@company.com"), only the email address is stored. +- Previously used emails may be surfaced for autocomplete (optional enhancement). + +## Visibility Across the UI + +### Alert Rules + +- Email recipients appear under the destination. +- If the list is long, the UI shows the first few emails and then a "+X more" indicator. +- Hovering displays the full recipient list. + +### Test Overview + +- Connected email destinations appear in the Destinations section for each test. + +### Alert Destinations Tab + +- All defined email alert destinations can be viewed and managed in this tab. + diff --git a/docs/cloud/integrations/alerts/jira.mdx b/docs/cloud/integrations/alerts/jira.mdx index 9c824ddb3..e330f5645 100644 --- a/docs/cloud/integrations/alerts/jira.mdx +++ b/docs/cloud/integrations/alerts/jira.mdx @@ -2,14 +2,34 @@ title: "Jira" --- - - - -} -> - Click for details - \ No newline at end of file +Elementary's Jira integration enables creating Jira issues from incidents. + + + +## How to connect Jira +1. Go to the `Environments` page on the sidebar. +2. Select an environment and click connect on the `Connect ticketing system` card, and select `Jira`. +3. Authorize the Elementary app for your workspace. **This step may require a workspace admin approval.** +4. Select a default project for tickets +5. Click `Save` to finish the setup + +Elementary Jira tickets include the basic ticket fields, test information, asset information and metadata. When connecting Jira, please make sure there's no required field that Elementary doesn't provide. + + + + + + +## Creating Jira issues from incidents +When an incident is created, you can generate a Jira issue directly from the incident page by clicking **Create Jira Ticket**. This opens a pre-filled form where you can review, update, or add additional fields before submitting. Once created, the issue is automatically added to the Jira team you selected when connecting Jira. + +After the ticket is created, a link to the Jira issue appears on the incident page. The Jira ticket itself also includes a link back to the incident in Elementary for easy cross-referencing. + + If you connected the app before December 2025, you’ll need to re-authenticate the Jira app to change the **Reporter** or **Assignee** fields. You can do this by clicking the re-authentication link in the ticket form, or by going to your environment page and re-authenticating the app for the relevant environment. + + Elementary will not update the ticket in Jira when the incident is resolved or changed + + + + + diff --git a/docs/cloud/integrations/alerts/linear.mdx b/docs/cloud/integrations/alerts/linear.mdx index 5837a8c8a..35902c444 100644 --- a/docs/cloud/integrations/alerts/linear.mdx +++ b/docs/cloud/integrations/alerts/linear.mdx @@ -2,14 +2,30 @@ title: "Linear" --- - - - -} -> - Click for details - \ No newline at end of file +Elementary's Linear integration enables creating Linear tickets from incidents. + + + +## How to connect Linear +1. Go to the `Environments` page on the sidebar. +2. Select an environment and click connect on the `Connect ticketing system` card, and select `Linear`. +3. Authorize the Elementary app for your workspace. **This step may require a workspace admin approval.** +4. Select a default team for new tickets +5. Click `Save` to finish the setup + + + + + + +## Creating Linear ticket from incidents +When an incident is created, you can create a Linear ticket from the incident page by simply clicking on "Create Linear Ticket". +The ticket will automatically be created in Linear, in the team you chose upon connecting Linear. + +After the ticket is created you can see the Linear ticket link in the incident page. +The ticket will also contain a link to the incident in Elementary. + + +Note: Elementary will not update the ticket in Linear when the incident is resolved or changed in any way + + diff --git a/docs/cloud/integrations/alerts/ms-teams.mdx b/docs/cloud/integrations/alerts/ms-teams.mdx index d7ea82311..387567b89 100644 --- a/docs/cloud/integrations/alerts/ms-teams.mdx +++ b/docs/cloud/integrations/alerts/ms-teams.mdx @@ -5,12 +5,12 @@ title: "Microsoft Teams" Elementary's Microsoft Teams integration enables sending alerts when data issues happen. The alerts are sent using Adaptive Cards format, which provides rich formatting and interactive capabilities. -The alerts include rich context, and you can create [alert rules](/features/alerts-and-incidents/alert-rules) to distribute alerts to different channels and destinations. +The alerts include rich context, and you can create [alert rules](cloud/features/alerts-and-incidents/alert-rules) to distribute alerts to different channels and destinations.
MS teams alert screenshot
@@ -31,10 +31,7 @@ The alerts include rich context, and you can create [alert rules](/features/aler -3. For each MS Teams channel you connect to Elementary, you will need to create a Webhook. There are two ways to create a webhook: - - -1. Go to a channel in your Team and choose `Manage channel` +3. For each MS Teams channel you connect to Elementary, you will need to create a Webhook Using Microsoft Teams Connectors. Go to a channel in your Team and choose `Manage channel`.
@@ -46,7 +43,7 @@ The alerts include rich context, and you can create [alert rules](/features/aler
-2. Click on `Edit` connectors. +3. Click on `Edit` connectors.
@@ -58,7 +55,7 @@ The alerts include rich context, and you can create [alert rules](/features/aler
-3. Search for `Incoming webhook` and choose `Add`. +4. Search for `Incoming webhook` and choose `Add`.
@@ -70,7 +67,7 @@ The alerts include rich context, and you can create [alert rules](/features/aler
-4. Choose `Add` again and add a name to your webhook, then click on `Create`. +5. Choose `Add` again and add a name to your webhook, then click on `Create`.
@@ -82,7 +79,7 @@ The alerts include rich context, and you can create [alert rules](/features/aler
-5. Copy the URL of the webhook. +6. Copy the URL of the webhook.
@@ -94,36 +91,8 @@ The alerts include rich context, and you can create [alert rules](/features/aler
-**Note:** Microsoft 365 Connectors (previously called Office 365 Connectors) are nearing deprecation, and the creation of new Microsoft 365 Connectors will soon be blocked. Consider using Power Automate Workflows instead. - -
- - - -You can create a webhook using Power Automate in two ways: - -### Method 1: Directly from Teams (Recommended) - -1. Go to your Teams channel -2. Click the three dots (...) next to the channel name -3. Select `Workflows` -4. Choose the template "Post to channel when a webhook request is received" -5. Copy the webhook URL - -### Method 2: From Power Automate Website - -1. Go to [Power Automate](https://flow.microsoft.com) -2. Create a new instant cloud flow -3. Search for "When a HTTP request is received" as your trigger -4. In the flow, add a "Post adaptive card in a chat or channel" action -5. Configure the team and channel where you want to post -6. Save the flow and copy the HTTP POST URL - -**Important Note:** When using Power Automate Workflows, Elementary CLI cannot directly verify if messages were successfully delivered. You'll need to monitor your workflow runs in Power Automate to check for any errors. - - -4. Configure your Microsoft Teams webhooks, and give each one a name indicating it's connected channel: +7. Configure your Microsoft Teams webhooks, and give each one a name indicating it's connected channel:
@@ -135,11 +104,11 @@ You can create a webhook using Power Automate in two ways:
-5. Select a default channel for alerts, and set the suppression interval. +8. Select a default channel for alerts, and set the suppression interval. The default channel you select will automatically add a default [alert - rule](/features/alerts-and-incidents/alert-rules) to sends all failures to + rule](cloud/features/alerts-and-incidents/alert-rules) to sends all failures to this channel. Alerts on warnings are not sent by default. To modify and add tules, navigate to `Alert Rules` page. diff --git a/docs/cloud/integrations/alerts/opsgenie.mdx b/docs/cloud/integrations/alerts/opsgenie.mdx index 74df5e663..0240503bd 100644 --- a/docs/cloud/integrations/alerts/opsgenie.mdx +++ b/docs/cloud/integrations/alerts/opsgenie.mdx @@ -4,7 +4,7 @@ title: "Opsgenie" Elementary's Opsgenie integration enables sending alerts when data issues happen. -It is recommended to create [alert rules](/features/alerts-and-incidents/alert-rules) to filter and select the alerts that will create incidents in Opsgenei. +It is recommended to create [alert rules](cloud/features/alerts-and-incidents/alert-rules) to filter and select the alerts that will create incidents in Opsgenei. @@ -66,4 +66,4 @@ To create an `Opsgenie API key`, go to `Opsgenie` and follow the following steps -4. `Opsgenie` will now be available as a destination on the [`alert rules`](/features/alerts-and-incidents/alert-rules) page. You can add rules to create Opsgenie incidents out of alerts who match your rule. \ No newline at end of file +4. `Opsgenie` will now be available as a destination on the [`alert rules`](cloud/features/alerts-and-incidents/alert-rules) page. You can add rules to create Opsgenie incidents out of alerts who match your rule. \ No newline at end of file diff --git a/docs/cloud/integrations/alerts/pagerduty.mdx b/docs/cloud/integrations/alerts/pagerduty.mdx index 5b629aa6e..0ca58de02 100644 --- a/docs/cloud/integrations/alerts/pagerduty.mdx +++ b/docs/cloud/integrations/alerts/pagerduty.mdx @@ -4,7 +4,7 @@ title: "PagerDuty" Elementary's PagerDuty integration enables sending alerts when data issues happen. -It is recommended to create [alert rules](/features/alerts-and-incidents/alert-rules) to filter and select the alerts that will create incidents in PagerDuty. +It is recommended to create [alert rules](cloud/features/alerts-and-incidents/alert-rules) to filter and select the alerts that will create incidents in PagerDuty.
@@ -17,6 +17,14 @@ It is recommended to create [alert rules](/features/alerts-and-incidents/alert-r ## Enabling PagerDuty alerts + + + + +Important: The user connecting PagerDuty must have at least a "User" role (not "Limited User") in PagerDuty, as this role is required add "Events API v2" integrations to services. + + + 1. Go to the `Environments` page on the sidebar. 2. Select an environment and click connect on the `Connect incident management tool` card (second card), and select `PagerDuty`. @@ -42,4 +50,5 @@ It is recommended to create [alert rules](/features/alerts-and-incidents/alert-r
-4. `PagerDuty` will now be available as a destination on the [`alert rules`](/features/alerts-and-incidents/alert-rules) page. You can add rules to create PagerDuty incidents out of alerts who match your rule. \ No newline at end of file +4. `PagerDuty` will now be available as a destination on the [`alert rules`](cloud/features/alerts-and-incidents/alert-rules) page. You can add rules to create PagerDuty incidents out of alerts who match your rule. + diff --git a/docs/cloud/integrations/alerts/servicenow.mdx b/docs/cloud/integrations/alerts/servicenow.mdx new file mode 100644 index 000000000..779e012bb --- /dev/null +++ b/docs/cloud/integrations/alerts/servicenow.mdx @@ -0,0 +1,168 @@ +--- +title: "ServiceNow" +--- + +Elementary's ServiceNow integration enables creating ServiceNow incidents from incidents. + + +
+ +
+ + +## Create a ServiceNow app (Admin required) +1. Connect to your ServiceNow instance +2. Navigate to the `System OAuth` -> `Application Registry` page + +
+ +
+ + +3. Create a new application by clicking `New` -> `Create an OAuth API endpoint for external clients` + +
+ +
+ + +4. Set the following fields: + - Name: `Elementary` (Recommended) + - Redirect URL: `https://prod.nango.elementary-data.com/oauth/callback` + - Logo URL: `https://cdn.prod.website-files.com/65ec9eebc532cfda4df5fcd7/6781525ba8657fba8fa35446_Vector.svg` (Optional) + +
+ +
+ + +5. Save the application +6. Go to the created application and copy the `Client ID` and `Client Secret` + + + +## Add Elementary Link to the incidents table (Recommended) +1. Navigate to `System Definitions` -> `Table` + +
+ +
+ + +2. Select the `Incident` table + +
+ +
+ + +3. Under `Columns`, click `New` +4. Set the following fields: + - Type: `URL` + - Column Label: `Elementary Link` + - Column Name: `u_elementary_incident_url` + +
+ +
+ + +5. Click `Submit` + + +## How to connect your ServiceNow app +1. Go to the `Environments` page on the sidebar. +2. Select an environment and click connect on the `Connect ticketing system` card, and select `ServiceNow`. + +
+ +
+ + + +
+ +
+ + +3. Fill in the app details: + - Client ID: Client ID from the app you created in the previous step + - Client Secret: Client Secret from the app you created in the previous step + - Subdomain: The subdomain of your ServiceNow instance (e.g. for `https://my-company.service-now.com`, the subdomain is `my-company`) + +
+ +
+ + +4. Click `Connect` +5. Allow the app to access your ServiceNow instance. **This step may require admin approval.** + +
+ +
+ + +6. Choose a group to assign the incidents to + +
+ +
+ + +7. Click `Save` + + +## Creating ServiceNow incident from incidents +When an incident is created, you can create a ServiceNow incident from the incident page by simply clicking on "Create ticket". +The incident will automatically be created in ServiceNow, in the group you chose upon connecting ServiceNow. + +After the ticket is created you can see the ServiceNow incident link in the incident page. +The incident will also contain a link to the incident in Elementary. + +Note: Elementary will not update the incident in ServiceNow when the incident is resolved or changed in any way + + +
+ +
+ diff --git a/docs/cloud/integrations/alerts/slack.mdx b/docs/cloud/integrations/alerts/slack.mdx index 3738e79cf..36f660e85 100644 --- a/docs/cloud/integrations/alerts/slack.mdx +++ b/docs/cloud/integrations/alerts/slack.mdx @@ -5,7 +5,7 @@ title: "Slack" Elementary's Slack integration enables sending Slack alerts when data issues happen. The alerts include rich context, and you can change the incident status and asssigne from the alert itself. -You can also create [alert rules](/features/alerts-and-incidents/alert-rules) to distribute alerts to different channels and destinations. +You can also create [alert rules](cloud/features/alerts-and-incidents/alert-rules) to distribute alerts to different channels and destinations.
@@ -47,7 +47,7 @@ You can also create [alert rules](/features/alerts-and-incidents/alert-rules) to 5. Select a default channel for alerts, and set the suppression interval. -The default channel you select will automatically add a default [alert rule](/features/alerts-and-incidents/alert-rules) +The default channel you select will automatically add a default [alert rule](cloud/features/alerts-and-incidents/alert-rules) to sends all failures to this channel. Alerts on warnings are not sent by default. To modify and add tules, navigate to `Alert Rules` page. diff --git a/docs/cloud/integrations/bi/connect-bi-tool.mdx b/docs/cloud/integrations/bi/connect-bi-tool.mdx index 26e47e92d..890b72dcc 100644 --- a/docs/cloud/integrations/bi/connect-bi-tool.mdx +++ b/docs/cloud/integrations/bi/connect-bi-tool.mdx @@ -3,6 +3,10 @@ title: "Automated lineage to Data Visualization layer" sidebarTitle: "Automated BI lineage" --- +import BiCards from '/snippets/cloud/integrations/cards-groups/bi-cards.mdx'; + + + Elementary will automatically and continuously extend the column-level-lineage to the dashboard level of your data visualization tool. This will provide you end-to-end data lineage to understand your downstream dependencies, called exposures. @@ -14,19 +18,19 @@ This will provide you end-to-end data lineage to understand your downstream depe - **Change impact**: Analyze which exposures will be impacted by a planned change. - **Unused datasets**: Detect datasets that no exposure consumes, that could be removed to save costs. - - +frameborder="0" +allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" +allowfullscreen +alt="Elementary Lineage" +> ### Supported BI tools - + ### Automated dbt exposures.yml diff --git a/docs/cloud/integrations/bi/explo.mdx b/docs/cloud/integrations/bi/explo.mdx index 8c3a3b46f..7096444f6 100644 --- a/docs/cloud/integrations/bi/explo.mdx +++ b/docs/cloud/integrations/bi/explo.mdx @@ -3,7 +3,7 @@ title: "Explo" --- diff --git a/docs/cloud/integrations/bi/hex.mdx b/docs/cloud/integrations/bi/hex.mdx index 490caed4d..254f733dc 100644 --- a/docs/cloud/integrations/bi/hex.mdx +++ b/docs/cloud/integrations/bi/hex.mdx @@ -2,14 +2,24 @@ title: "Hex" --- - - - -} -> - Click for details - \ No newline at end of file +After you connect Hex, Elementary will automatically and continuously extend the lineage to the project & cell level. +This will provide you end-to-end data lineage to understand your downstream dependencies, called exposures. + +### Create API Token + +Elementary needs a workspace token in your account in order to access Hex's API on your behalf.
+To create one, please follow the [official Hex documentation](https://learn.hex.tech/docs/api/api-overview#workspace-tokens). +Make sure you create a Hex Workspace token with read access for all categories and "read project queried tables". + +### Connecting Hex to Elementary + +Navigate to the **Account settings > Environments** and choose the environment to which you would like to connect Hex. +Choose the Hex connection and provide the following details to validate and complete the integration. + +- **Base URL**: Your Hex workspace URL. For example: `https://app.hex.tech/my-workspace/` +- **Workspace Token**: The Hex workspace token you've created on the previous step. + + +### Limitations + +- **Hex assets will only be visible if their project has run within the last 14 days.** Elementary tracks Hex query history to determine asset visibility, and only projects with recent execution activity will appear in the lineage graph. diff --git a/docs/cloud/integrations/bi/lightdash.mdx b/docs/cloud/integrations/bi/lightdash.mdx new file mode 100644 index 000000000..e902055d4 --- /dev/null +++ b/docs/cloud/integrations/bi/lightdash.mdx @@ -0,0 +1,31 @@ +--- +title: "Lightdash" +--- + +After you connect Lightdash, Elementary will automatically and continuously extend the column-level-lineage to the dashboard and chart level. +This will provide you end-to-end data lineage to understand your downstream dependencies, called exposures. + +### Create a Personal Access Token + +Elementary needs a Personal Access Token (PAT) to access the Lightdash API on your behalf. + +1. In Lightdash, go to **Settings > Personal Access Tokens**. +2. Click **Generate Token**. +3. Give the token a descriptive name (e.g. "Elementary integration"). +4. Copy and save the generated token securely — you will need it when connecting Lightdash to Elementary. + +For more details, refer to the [official Lightdash documentation](https://docs.lightdash.com/references/workspace/personal-tokens). + + + The token inherits the permissions of the user who created it. Make sure the user has access to all the projects you want Elementary to sync. + + +### Connecting Lightdash to Elementary + +Navigate to the **Account settings > Environments** and choose the environment to which you would like to connect Lightdash. +Choose the Lightdash BI connection and provide the following details to validate and complete the integration. + +- **Connection Name:** A descriptive name for this Lightdash connection (e.g. "My Lightdash"). +- **API Token:** The Personal Access Token you generated in the previous step. +- **Host URL** *(optional)*: Leave empty to use the default Lightdash Cloud URL (`https://app.lightdash.cloud`). Set this only if you are using a self-hosted Lightdash instance (e.g. `https://lightdash.mycompany.com`). +- **Project UUIDs** *(optional)*: Leave empty to sync all projects in your Lightdash organization. Add specific project UUIDs to sync only those projects. You can find a project's UUID in the URL when viewing it in Lightdash (e.g. `https://app.lightdash.cloud/projects//...`). diff --git a/docs/cloud/integrations/bi/looker.mdx b/docs/cloud/integrations/bi/looker.mdx index 26d64f603..1cb253bda 100644 --- a/docs/cloud/integrations/bi/looker.mdx +++ b/docs/cloud/integrations/bi/looker.mdx @@ -55,7 +55,7 @@ Choose the Looker BI connection and provide the following details to validate an - **LookML code repository**: - Token - [Github](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens#creating-a-personal-access-token-classic) - - [Gitlab](https://docs.gitlab.com/ee/user/project/settings/project_access_tokens.html) + - [Gitlab](https://docs.gitlab.com/ee/user/project/settings/project_access_tokens.html) - make sure the role is `developer` and the scopes include `read_api, read_repository` - Repository - The repository name where your LookML code is. diff --git a/docs/cloud/integrations/bi/mode.mdx b/docs/cloud/integrations/bi/mode.mdx index 8f559fe0a..15a49129e 100644 --- a/docs/cloud/integrations/bi/mode.mdx +++ b/docs/cloud/integrations/bi/mode.mdx @@ -3,7 +3,7 @@ title: "Mode" --- diff --git a/docs/cloud/integrations/bi/power-bi.mdx b/docs/cloud/integrations/bi/power-bi.mdx index e16244bfa..ee6669d23 100644 --- a/docs/cloud/integrations/bi/power-bi.mdx +++ b/docs/cloud/integrations/bi/power-bi.mdx @@ -25,7 +25,7 @@ Those features will allow Elementary to get all required info for computing the ### Connecting Power BI to Elementary -Navigate to the **Account settings > Environments** and choose the environment to which you would like to connect Elementary. +Navigate to the **Account settings > Environments** and choose the environment to which you would like to connect Power BI. Choose the Power BI connection and provide the following details to validate and complete the integration. - **Tenant:** Your Microsoft tenant which is usaully your company's domain. e.g. `my-company.com` diff --git a/docs/cloud/integrations/bi/sigma.mdx b/docs/cloud/integrations/bi/sigma.mdx index 11207d4f0..ff9ddc037 100644 --- a/docs/cloud/integrations/bi/sigma.mdx +++ b/docs/cloud/integrations/bi/sigma.mdx @@ -2,14 +2,31 @@ title: "Sigma" --- - - - -} -> - Click for details - \ No newline at end of file +After you connect Sigma, Elementary will automatically and continuously extend the lineage to the workbook page & element level. +This will provide you end-to-end data lineage to understand your downstream dependencies, called exposures. + +### Create API Client Credentials + +Elementary needs authorized client credentials in your account in order to access Sigma's API on your behalf.
+To create those, Please follow the [official Sigma documentation](https://help.sigmacomputing.com/reference/generate-client-credentials#generate-api-client-credentials). +Make sure you enable 'REST API' privileges for that client. + +### Connecting Sigma to Elementary + +Navigate to the **Account settings > Environments** and choose the environment to which you would like to connect Sigma. +Choose the Sigma connection and provide the following details to validate and complete the integration. + +- **Cloud Provider:** To determine your Sigma cloud provider, Navigate to **Account -> General Settings** under Sigma's **Administration** menu and look for **'Cloud: ...'**.
Should be one of the following: + - `AWS US` + - `AWS Canada` + - `AWS Europe` + - `AWS UK` + - `Azure US` + - `GCP` +- **Client ID**: The Sigma client ID you've created on the previous step. +- **Client Secret:** The new Sigma client secret you've created on the previous step. + + +### Limitations + +`Datasets` or `Data Models` are currently excluded from computed lineage graph - which will point from DWH directly to your Workbook Elements.
\ No newline at end of file diff --git a/docs/cloud/integrations/bi/thoughtspot.mdx b/docs/cloud/integrations/bi/thoughtspot.mdx index 52b31d8e9..160c8bf81 100644 --- a/docs/cloud/integrations/bi/thoughtspot.mdx +++ b/docs/cloud/integrations/bi/thoughtspot.mdx @@ -1,15 +1,32 @@ --- -title: "ThoughtSpot" +title: "Thoughtspot" --- - - - -} -> - Click for details - \ No newline at end of file +After you connect Thoughtspot, Elementary will automatically and continuously extend the lineage to the liveboard and answer level. +This will provide you end-to-end data lineage to understand your downstream dependencies, called exposures. + + +### Enable Trusted Authentication on a privileged user + +For Elementary to access your Thoughtspot instance's API on your behalf of your user, your user should have Trusted Authentication enabled.
+To enable Trusted Authentication on a user, please follow the [official Thoughtspot documentation](https://developers.thoughtspot.com/docs/trusted-auth-secret-key).
+Make sure you copy the generated token (`Secret Key`) as you will need it to connect Thoughtspot to Elementary. + +### User Privileges + +For an easy integration, it's recommended for the connected user to be an administrator (`ADMINISTRATION` privilege), this will ensure Elementary can access all of your Liveboards and Answers.
+It is also possible though to integrate with a regular user, just make sure it can download data (has `DATADOWNLOADING` privilege) for all the relvant ThoughtSpot entities you want Elementary to discover and show lineage for. + +### Connecting Thoughtspot to Elementary + +Navigate to the **Account settings > Environments** and choose the environment to which you would like to connect ThoughtSpot. +Choose the Thoughtspot connection and provide the following details to validate and complete the integration. + +- **User Name:** The username of the user you want to use to connect to Thoughtspot. +- **Secret Key:** The token generated for the user you want to use to connect to Thoughtspot (from the previous step). +- **Base URL:** The URL of your Thoughtspot instance. This would be `'https://.thoughtspot.cloud'` by default, or your custom domain if [you've configured one](https://docs.thoughtspot.com/cloud/10.1.0.cl/custom-domains#_domain_url_customization). If you're unsure, just check the URL you use to access your Thoughtspot instance in the browser. + + +### Limitations + +`Worksheets` or `Models` are currently excluded from computed lineage graph - which will point from DWH directly to your Liveboards or Answers.
\ No newline at end of file diff --git a/docs/cloud/integrations/code-repo/azure-devops.mdx b/docs/cloud/integrations/code-repo/azure-devops.mdx new file mode 100644 index 000000000..0784986ac --- /dev/null +++ b/docs/cloud/integrations/code-repo/azure-devops.mdx @@ -0,0 +1,43 @@ +--- +title: "Azure DevOps Integration" +sidebarTitle: "Azure DevOps" +--- + +Elementary can integrate with Azure DevOps to connect to the code repository where your **dbt project code** is managed, and it opens pull requests with configuration changes. + +## Connecting Through the Azure DevOps App + +1. Navigate to **Environments** in Elementary Cloud, open your environment, then go to **Code repository**. +2. Click on **Connect** and select **Azure DevOps**. +3. Enter your Azure DevOps organization URL \ +(e.g., `https://dev.azure.com/your-organization`). +4. Click **Save**. +5. Connect through OAuth to authenticate between Azure DevOps and Elementary Cloud. During this process, a temporary token is issued, which can be used to make API calls. Along with the temporary token, a refresh token is also provided. The refresh token is used when Azure DevOps indicates that the temporary token has expired. For Microsoft services, OAuth is managed by Microsoft Entra ID (formerly known as Active Directory). + +## Updating tokens when they expire + +When your Azure DevOps token expires, update the connection with new tokens: + +1. Go to **Environments**, open your environment, and go to **Code repository**. +2. Click **Edit** on the code repository connection. +3. Paste the new generated tokens (from Azure DevOps) and save. + +--- + +## Required Permissions + +Elementary requires the following permissions in your Azure DevOps **dbt repository**: + +- **Read and write** access to the repository +- Access to **file contents** +- Permission to **open and read pull requests** + +--- + +## Troubleshooting + +If you encounter issues with the Azure DevOps integration, ensure the following: + +1. Your **organization URL** is correct. +2. You have **sufficient permissions** in Azure DevOps. +3. Elementary is properly **authorized** in your Azure DevOps organization. diff --git a/docs/cloud/integrations/code-repo/bitbucket.mdx b/docs/cloud/integrations/code-repo/bitbucket.mdx new file mode 100644 index 000000000..d44d7d0fa --- /dev/null +++ b/docs/cloud/integrations/code-repo/bitbucket.mdx @@ -0,0 +1,26 @@ +--- +title: "Bitbucket" +--- + +import RepoConnectionSettings from '/snippets/cloud/integrations/repo-connection-settings.mdx'; + + + +Elementary connects to the code repository where your dbt project code is managed, and opens PRs with configuration changes. + +## Recommended: Connect using Elementary Bitbucket App + +Navigate to the **Account settings > Environments** and choose the environment to which you would like to connect the dbt project code repository. + +Simply Click the blue button that says "Connect with Elementary Bitbucket App" and follow the instructions. +In the menu that opens up later on, select the repository where your dbt project is stored, and if needed the branch and path to the dbt project. + +Requires a user with permissions to install new applications in the repository + +### Alternative: Create a Bitbucket project token + +If connecting the Elementary Bitbucket App isn't an option, you can connect using a token managed by your team instead. + +## Repository connection settings + + diff --git a/docs/cloud/integrations/code-repo/connect-code-repo.mdx b/docs/cloud/integrations/code-repo/connect-code-repo.mdx index 92e1e089a..116b83ea9 100644 --- a/docs/cloud/integrations/code-repo/connect-code-repo.mdx +++ b/docs/cloud/integrations/code-repo/connect-code-repo.mdx @@ -3,7 +3,11 @@ title: "Connect code repository" sidebarTitle: "Code integration" --- -We believe configuration should be managed in code. +import CodeRepoCards from '/snippets/cloud/integrations/cards-groups/code-repo-cards.mdx'; + + + +We believe [configuration should be managed in code](/cloud/features/config-as-code). With config-as-code you get: version control, CI, review process. Adding and updating configuration becomes part of your development process. @@ -12,4 +16,4 @@ Through integration and access to the dbt project code repository, Elementary op ### Supported code repositories - + diff --git a/docs/cloud/integrations/code-repo/github.mdx b/docs/cloud/integrations/code-repo/github.mdx index 1cca0f38a..b8a8ed61e 100644 --- a/docs/cloud/integrations/code-repo/github.mdx +++ b/docs/cloud/integrations/code-repo/github.mdx @@ -2,18 +2,27 @@ title: "Github" --- +import RepoConnectionSettings from '/snippets/cloud/integrations/repo-connection-settings.mdx'; + + + Elementary connects to the code repository where your dbt project code is managed, and opens PRs with configuration changes. -### Recommended: Connect using Elementary Elementary Github App +## Recommended: Connect using Elementary Github App + +Navigate to the **Account settings > Environments** and choose the environment to which you would like to connect the dbt project code repository. Simply Click the blue button that says "Connect with Elementary Github App" and follow the instructions. In the menu that opens up later on, select the repository where your dbt project is stored, and if needed the branch and path to the dbt project. -### Create a Github [fine-grained token](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens#creating-a-fine-grained-personal-access-token) +Requires a user with permissions to install new applications in the repository. + -If for some reason you prefer to, you can connect to Github using a fine-grained token managed by your team instead. -Here is how you can create one: +## Alternative: Create a Github [fine-grained token](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens#creating-a-fine-grained-personal-access-token) +If connecting the Elementary Github App isn't an option, you can connect to Github using a fine-grained token managed by your team instead. + + 1. In the upper-right corner of any page, click your profile photo, then click **Settings**. 2. On the bottom of the left sidebar, click **Developer settings**. 3. On the left sidebar, select **Personal access tokens > Fine-grained tokens**. @@ -42,15 +51,8 @@ Here is how you can create one: 9. Click **Generate token**. + -### Connect Github to Elementary - -Navigate to the **Account settings > Environments** and choose the environment to which you would like to connect the dbt project code repository. -Select **Connect code repository**, and under Github enter the generated token and repo full name: +## Repository connection settings - - Github connection to Elementary - + diff --git a/docs/cloud/integrations/code-repo/gitlab.mdx b/docs/cloud/integrations/code-repo/gitlab.mdx index 9372273af..4f65c192f 100644 --- a/docs/cloud/integrations/code-repo/gitlab.mdx +++ b/docs/cloud/integrations/code-repo/gitlab.mdx @@ -2,9 +2,28 @@ title: "Gitlab" --- +import RepoConnectionSettings from '/snippets/cloud/integrations/repo-connection-settings.mdx'; + + + Elementary connects to the code repository where your dbt project code is managed, and opens PRs with configuration changes. -### Create a Gitlab project token + +## Recommended: Connect using Elementary Gitlab App + +Navigate to the **Account settings > Environments** and choose the environment to which you would like to connect the dbt project code repository. + +Simply Click the blue button that says "Connect with Elementary Gitlab App" and follow the instructions. +In the menu that opens up later on, select the repository where your dbt project is stored, and if needed the branch and path to the dbt project. + +Requires a user with permissions to install new applications in the repository + + +### Alternative: Create a Gitlab project token + +If connecting the Elementary Gitlab App isn't an option, you can connect using a token managed by your team instead. + + You need to create a [project access token](https://docs.gitlab.com/ee/user/project/settings/project_access_tokens.html) (token for a specific repository) with by following these steps: @@ -15,14 +34,8 @@ You need to create a [project access token](https://docs.gitlab.com/ee/user/proj 5. Select the following scopes: `api`, `read_api`, `read_repository`, `write_repository`. 6. Select **Create project access token**. -### Connect Gitlab to Elementary + -Navigate to the **Account settings > Environments** and choose the environment to which you would like to connect the dbt project code repository. -Select **Connect code repository**, and under Gitlab enter the generated token and repo full name: - - - Gitlab connection to Elementary - +## Repository connection settings + + \ No newline at end of file diff --git a/docs/cloud/integrations/dwh/athena.mdx b/docs/cloud/integrations/dwh/athena.mdx index b74351075..f51a076c5 100644 --- a/docs/cloud/integrations/dwh/athena.mdx +++ b/docs/cloud/integrations/dwh/athena.mdx @@ -3,14 +3,10 @@ title: "Connect to Athena" sidebarTitle: "Athena" --- - - - - } - > - Click for details - \ No newline at end of file +import Athena from '/snippets/cloud/integrations/athena.mdx'; +import OnboardingHelp from '/snippets/cloud/integrations/onboarding-help.mdx'; + + + + + diff --git a/docs/cloud/integrations/dwh/bigquery.mdx b/docs/cloud/integrations/dwh/bigquery.mdx index 3bb44ddcb..12e0457a4 100644 --- a/docs/cloud/integrations/dwh/bigquery.mdx +++ b/docs/cloud/integrations/dwh/bigquery.mdx @@ -3,5 +3,10 @@ title: "Connect to Bigquery" sidebarTitle: "Bigquery" --- - - +import Bigquery from '/snippets/cloud/integrations/bigquery.mdx'; +import OnboardingHelp from '/snippets/cloud/integrations/onboarding-help.mdx'; + + + + + diff --git a/docs/cloud/integrations/dwh/clickhouse.mdx b/docs/cloud/integrations/dwh/clickhouse.mdx index 005388b41..c37e3f145 100644 --- a/docs/cloud/integrations/dwh/clickhouse.mdx +++ b/docs/cloud/integrations/dwh/clickhouse.mdx @@ -1,16 +1,12 @@ --- -title: "Connect to ClickHouse" -sidebarTitle: "ClickHouse" +title: "Connect to Clickhouse" +sidebarTitle: "Clickhouse" --- - - - - } - > - Click for details - \ No newline at end of file +import Clickhouse from '/snippets/cloud/integrations/clickhouse.mdx'; +import OnboardingHelp from '/snippets/cloud/integrations/onboarding-help.mdx'; + + + + + diff --git a/docs/cloud/integrations/dwh/databricks.mdx b/docs/cloud/integrations/dwh/databricks.mdx index 11b0123f3..392d8a587 100644 --- a/docs/cloud/integrations/dwh/databricks.mdx +++ b/docs/cloud/integrations/dwh/databricks.mdx @@ -3,5 +3,10 @@ title: "Connect to Databricks" sidebarTitle: "Databricks" --- - - +import Databricks from '/snippets/cloud/integrations/databricks.mdx'; +import OnboardingHelp from '/snippets/cloud/integrations/onboarding-help.mdx'; + + + + + diff --git a/docs/cloud/integrations/dwh/dremio.mdx b/docs/cloud/integrations/dwh/dremio.mdx new file mode 100644 index 000000000..138b9d54f --- /dev/null +++ b/docs/cloud/integrations/dwh/dremio.mdx @@ -0,0 +1,78 @@ +--- +title: "Connect to Dremio" +sidebarTitle: "Dremio" +--- + +import OnboardingHelp from '/snippets/cloud/integrations/onboarding-help.mdx'; + + + + +**Note:** We currently support **Dremio Cloud only**. If you are using Dremio Software, please contact us for assistance. + +### Create a user for Elementary cloud + +Create an email account for the Elementary user. +Example: **elementary@your-organization.com** + +On your dbt project, run: + +```bash +## Print the query you should run to generate a user. +dbt run-operation create_elementary_user --args '{user: the_mail_of_the_elementary_user}' +``` + +This command will generate a query to create a user with the necessary permissions. Run this query on your data warehouse with admin permissions to create the user. It will send an email invitation, you need to accept that invitation. + +After the invitation was accepted, you need to sign in to Dremio as the Elementary user, and open "Account Settings" by clicking on the bottom-left corner of the Dremio UI. + +Click **Generate Token**: + + +
+ Generate Token in Dremio +
+ + +Set the **maximum lifetime** allowed by Dremio (currently **180 days**). + +⚠️ Put a reminder to renew the token before it expires. + +You can update the token in Elementary's UI at any time. + +### Permissions and security + +Elementary cloud doesn't require read permissions to your tables and schemas, but only the following: + +- Read-only access to the elementary schema. +- Access to read metadata in information schema and query history, related to the tables in your dbt project. + +It is recommended to create a user using the instructions specified above to avoid granting excess privileges. For more details, refer to + +[**security and privacy**](https://docs.elementary-data.com/cloud/general/security-and-privacy) + +### Fill the connection form + +Provide the following fields: + +- **Host:** + - US: `api.dremio.cloud` + - EU: `api.eu.dremio.cloud` +- **Object Storage:** Name of the object storage where the Elementary schema is stored. +- **Object Storage Path:** Path inside the object storage where the Elementary schema is stored. +- **Project ID:** Your Dremio Cloud project ID. +- **User:** The email address of the Elementary user. +- **Token:** The token you generated for the Elementary user. + +### Connect your metadata store (optional) + +If your Dremio sources contains Iceberg tables, you can connect your metadata store (Iceberg catalog) to your environment. This will allow automatically monitoring volume +and freshness of your Iceberg tables (even if they are ingested outside of Dremio). + +Currently only [AWS Glue](/cloud/integrations/metadata-layer/glue) is supported, though more metadata integrations are planned. + + diff --git a/docs/cloud/integrations/dwh/duckdb.mdx b/docs/cloud/integrations/dwh/duckdb.mdx new file mode 100644 index 000000000..5c223e867 --- /dev/null +++ b/docs/cloud/integrations/dwh/duckdb.mdx @@ -0,0 +1,33 @@ +--- +title: "Connect to DuckDB" +sidebarTitle: "DuckDB" +--- + + + + + + +} +> + Coming soon + diff --git a/docs/cloud/integrations/dwh/fabric.mdx b/docs/cloud/integrations/dwh/fabric.mdx new file mode 100644 index 000000000..26f3c4abd --- /dev/null +++ b/docs/cloud/integrations/dwh/fabric.mdx @@ -0,0 +1,41 @@ +--- +title: "Connect to Fabric" +sidebarTitle: "Fabric" +--- + + + + + + + + + + + + + + + + + + + + + + + +} +> + Coming soon + diff --git a/docs/cloud/integrations/dwh/postgres.mdx b/docs/cloud/integrations/dwh/postgres.mdx index 59ad89181..f34c2040a 100644 --- a/docs/cloud/integrations/dwh/postgres.mdx +++ b/docs/cloud/integrations/dwh/postgres.mdx @@ -3,5 +3,14 @@ title: "Connect to Postgres" sidebarTitle: "Postgres" --- - - +import CreateUserOperation from '/snippets/cloud/integrations/create-user-operation.mdx'; +import Postgres from '/snippets/cloud/integrations/postgres.mdx'; +import OnboardingHelp from '/snippets/cloud/integrations/onboarding-help.mdx'; + + + +You will connect Elementary Cloud to Postgres for syncing the Elementary schema (created by the [Elementary dbt package](/cloud/onboarding/quickstart-dbt-package)). + + + + diff --git a/docs/cloud/integrations/dwh/redshift.mdx b/docs/cloud/integrations/dwh/redshift.mdx index f95902651..fb286c47b 100644 --- a/docs/cloud/integrations/dwh/redshift.mdx +++ b/docs/cloud/integrations/dwh/redshift.mdx @@ -3,5 +3,10 @@ title: "Connect to Redshift" sidebarTitle: "Redshift" --- - - +import Redshift from '/snippets/cloud/integrations/redshift.mdx'; +import OnboardingHelp from '/snippets/cloud/integrations/onboarding-help.mdx'; + + + + + diff --git a/docs/cloud/integrations/dwh/snowflake.mdx b/docs/cloud/integrations/dwh/snowflake.mdx index 56e902cd3..491e1ae05 100644 --- a/docs/cloud/integrations/dwh/snowflake.mdx +++ b/docs/cloud/integrations/dwh/snowflake.mdx @@ -3,5 +3,10 @@ title: "Connect to Snowflake" sidebarTitle: "Snowflake" --- - - +import Snowflake from '/snippets/cloud/integrations/snowflake.mdx'; +import OnboardingHelp from '/snippets/cloud/integrations/onboarding-help.mdx'; + + + + + diff --git a/docs/cloud/integrations/dwh/spark.mdx b/docs/cloud/integrations/dwh/spark.mdx new file mode 100644 index 000000000..bf0e8164f --- /dev/null +++ b/docs/cloud/integrations/dwh/spark.mdx @@ -0,0 +1,27 @@ +--- +title: "Connect to Spark" +sidebarTitle: "Spark" +--- + + + + + + +} +> + Coming soon + diff --git a/docs/cloud/integrations/dwh/sqlserver.mdx b/docs/cloud/integrations/dwh/sqlserver.mdx new file mode 100644 index 000000000..8a5b7ef3b --- /dev/null +++ b/docs/cloud/integrations/dwh/sqlserver.mdx @@ -0,0 +1,22 @@ +--- +title: "Connect to SQL Server" +sidebarTitle: "SQL Server" +--- + + + + +} +> + Coming soon + diff --git a/docs/cloud/integrations/dwh/trino.mdx b/docs/cloud/integrations/dwh/trino.mdx index 18ca5b548..2a4181ca3 100644 --- a/docs/cloud/integrations/dwh/trino.mdx +++ b/docs/cloud/integrations/dwh/trino.mdx @@ -7,10 +7,29 @@ sidebarTitle: "Trino" title="Trino" href="https://tally.so/r/3N6DlW?integration=Trino" icon={ - - + + + + + + + + + + + + + + + + + + + + + } > Click for details -
\ No newline at end of file +
diff --git a/docs/cloud/integrations/dwh/vertica.mdx b/docs/cloud/integrations/dwh/vertica.mdx new file mode 100644 index 000000000..0bbcb75ea --- /dev/null +++ b/docs/cloud/integrations/dwh/vertica.mdx @@ -0,0 +1,29 @@ +--- +title: "Connect to Vertica" +sidebarTitle: "Vertica" +--- + + + + + + + + + + + +} +> + Coming soon + diff --git a/docs/cloud/integrations/elementary-integrations.mdx b/docs/cloud/integrations/elementary-integrations.mdx index 877da032b..019e700e7 100644 --- a/docs/cloud/integrations/elementary-integrations.mdx +++ b/docs/cloud/integrations/elementary-integrations.mdx @@ -3,4 +3,10 @@ title: "Elementary integrations" sidebarTitle: "All integrations" --- - \ No newline at end of file +import CloudIntegrationsCards from '/snippets/cloud/integrations/cards-groups/cloud-integrations-cards.mdx'; + + + +Elementary connects seamlessly to your data warehouse, BI tools, code repositories, alerting, and ticketing systems—giving you full context and true end-to-end, column-level lineage. This unified view makes it easy to trace issues back to their source, understand downstream impact, and resolve problems faster. + + diff --git a/docs/cloud/integrations/governance/atlan.mdx b/docs/cloud/integrations/governance/atlan.mdx new file mode 100644 index 000000000..9d0ca3511 --- /dev/null +++ b/docs/cloud/integrations/governance/atlan.mdx @@ -0,0 +1,36 @@ +--- +title: "Atlan" +--- + +Elementary aims to meet business users where they live, so we believe it's important to allow anyone who uses data to + know whether the data is healthy quickly, easily and without any technical knowledge required. This is why we integrated with Atlan. + +The integration works by pushing key insights to Atlan assets as custom metadata. +This metadata includes data health scores and open incidents, providing visibility into the quality and status of your +data. + + +
+ Elementary metadata in Atlan asset view +
+ + +### Atlan API Key +To generate an API key, follow these steps: +1. Create a new Persona in Atlan, with the following permissions: + - Assets: Read, Update + - Governance: Update custom metadata values + +### Atlan Base Url +Your Atlan base URL, for example `https://my-company.atlan.com` + +### Elementary Account Token (required for connecting through the Atlan app) +You can generate tokens directly from the Elementary UI: go to [Account → Account Tokens](https://app.elementary-data.com/settings/account-tokens). + +Quick steps: +1. Open the **Account → Account Tokens** page and click **Generate token**. +2. (Optional) Add a name/description. +3. Copy the token and store it securely — it is shown once. Manage (revoke/rotate) anytime from the same page. diff --git a/docs/cloud/integrations/log-streaming/datadog.mdx b/docs/cloud/integrations/log-streaming/datadog.mdx new file mode 100644 index 000000000..1d82a0d40 --- /dev/null +++ b/docs/cloud/integrations/log-streaming/datadog.mdx @@ -0,0 +1,145 @@ +--- +title: "Datadog" +--- + +Elementary's Datadog integration enables streaming audit logs and system logs directly to your Datadog account for centralized log management and monitoring. + +## Overview + +When enabled, Elementary automatically streams your workspace's [audit logs](/cloud/features/collaboration-and-communication/audit_logs/overview) ([user activity logs](/cloud/features/collaboration-and-communication/audit_logs/user-activity-logs) and [system logs](/cloud/features/collaboration-and-communication/audit_logs/system-logs)) to Datadog using the [Datadog Logs API](https://docs.datadoghq.com/api/latest/logs/#send-logs). This allows you to: + +- Centralize all logs in your Datadog dashboard +- Set up custom alerts and monitors on log events +- Correlate Elementary logs with other application logs +- Perform advanced log analysis and search +- Maintain long-term log retention in Datadog + +## Prerequisites + +Before configuring log streaming to Datadog, you'll need: + +1. **Datadog API Key** - Your Datadog API key for authentication + - You can find or create an API key in your Datadog account under [Organization Settings > API Keys](https://app.datadoghq.com/organization-settings/api-keys) + +2. **Datadog Site** (optional) - Your Datadog site region + - Default: `datadoghq.com` (US) + - EU: `datadoghq.eu` + - US3: `us3.datadoghq.com` + - US5: `us5.datadoghq.com` + - AP1: `ap1.datadoghq.com` + +## Configuring Log Streaming to Datadog + +1. Navigate to the [**Logs**](/cloud/features/collaboration-and-communication/audit_logs/overview) page: + - Click on your **account name** in the top-right corner of the UI + - Open the dropdown menu + - Select **Logs** + +2. In the **External Integrations** section, click the **Connect** button + +3. In the modal that opens, select **Datadog** as your log streaming destination + +4. Enter your Datadog configuration: + - **API Key**: Your Datadog API key + - **Site** (optional): Your Datadog site region (defaults to `datadoghq.com` if not specified) + - **Service Name** (optional): Custom service name for logs in Datadog (defaults to `elementary`) + - **Source** (optional): Custom source tag for logs (defaults to `elementary-cloud`) + + +
+ +
+ + +6. Click **Connect** to enable log streaming + +7. After connecting, you will see the Datadog integration listed in the **External Integrations** section. You can edit or disable the integration at any time. + +
+ +
+ + + +The log streaming configuration applies to your entire workspace. All logs matching your selected log types will be streamed to Datadog in real-time. + + +## Log Format in Datadog + +Logs are sent to Datadog with the following structure: + +- `timestamp`: ISO 8601 timestamp of the event +- `log_type`: The type of log (`audit` for user activity logs, `system` for system logs) +- `status`: Log level (`info` for successful actions, `error` for failed actions) +- `service`: Service name (configurable, defaults to `elementary`) +- `source`: Source tag (configurable, defaults to `elementary-cloud`) +- `event_name`: The specific action that was performed (e.g., `user_login`, `create_test`) +- `success`: Boolean indicating whether the action completed successfully +- `user_email`: User email address (only present in audit logs) +- `user_name`: User display name (only present in audit logs) +- `env_id`: Environment identifier (empty for account-level actions) +- `env_name`: Environment name (empty for account-level actions) +- `event_content`: Additional context-specific information as a JSON object +- `dd.tags`: Additional tags including: + - `log_type:` (e.g., `audit`, `system`) + - `event_name:` (e.g., `user_login`, `create_test`) + - `env_id:` (if applicable) + +## Viewing Logs in Datadog + +Once configured, logs will appear in your Datadog [Log Explorer](https://app.datadoghq.com/logs) within a few seconds of being generated. + +You can filter logs using: +- `source:elementary-cloud` - All Elementary logs +- `log_type:audit` - User activity logs only +- `log_type:system` - System logs only +- `event_name:` - Specific action types +- `env_id:` - Logs from a specific environment +- `success:false` - Failed operations only + + +
+ +
+ +
+ +
+ + +## Troubleshooting + +### Logs not appearing in Datadog + +1. **Verify API Key**: Ensure your Datadog API key is valid and has the necessary permissions +2. **Check Site Configuration**: Verify you've selected the correct Datadog site region +3. **Review Log Types**: Confirm the log types you want to stream are enabled +4. **Check Datadog Status**: Verify your Datadog account is active and not rate-limited + +### Rate Limiting + +Datadog has rate limits for log ingestion. If you're experiencing issues: +- Check your Datadog account's rate limits in the [Usage & Billing](https://app.datadoghq.com/billing/usage) page +- Consider filtering which log types you stream if you have high log volume +- Contact Datadog support if you need to increase your rate limits + +## Disabling Log Streaming + +To disable log streaming to Datadog: + +1. Navigate to the [**Logs**](/cloud/features/collaboration-and-communication/audit_logs/overview) page +2. In the **External Integrations** section, find your Datadog integration +3. Click **Disable** or remove the Datadog configuration +4. Confirm the action + + +Disabling log streaming will stop sending new logs to Datadog immediately. Historical logs already sent to Datadog will remain in your Datadog account according to your retention settings. + diff --git a/docs/cloud/integrations/log-streaming/gcs.mdx b/docs/cloud/integrations/log-streaming/gcs.mdx new file mode 100644 index 000000000..39c6a82fb --- /dev/null +++ b/docs/cloud/integrations/log-streaming/gcs.mdx @@ -0,0 +1,186 @@ +--- +title: "Google Cloud Storage (GCS)" +--- + +Elementary's Google Cloud Storage (GCS) integration enables streaming audit logs and system logs directly to your GCS bucket for long-term storage, analysis, and integration with other Google Cloud services. + +## Overview + +When enabled, Elementary automatically streams your workspace's [audit logs](/cloud/features/collaboration-and-communication/audit_logs/overview) ([user activity logs](/cloud/features/collaboration-and-communication/audit_logs/user-activity-logs) and [system logs](/cloud/features/collaboration-and-communication/audit_logs/system-logs)) to your GCS bucket using the [Google Cloud Storage API](https://cloud.google.com/storage/docs/json_api). This allows you to: + +- Store logs in your own GCS bucket for long-term retention +- Integrate logs with BigQuery, Dataflow, or other Google Cloud analytics services +- Maintain full control over log storage and access policies +- Process logs using Google Cloud data processing tools +- Archive logs for compliance and audit requirements + +## Prerequisites + +Before configuring log streaming to GCS, you'll need: + +1. **GCS Bucket** - A Google Cloud Storage bucket where logs will be stored + - The bucket must exist and be accessible + - You'll need the bucket path (e.g., `gs://my-logs-bucket`) + +2. **Google Cloud Service Account** - A service account with permissions to write to the bucket + - Required role: `Storage Object User` (roles/storage.objectUser) + - You'll need to generate a service account JSON key file + - The service account key file must be uploaded in Elementary + - **Workload Identity Federation**: Support for Workload Identity Federation with BigQuery service accounts is coming soon + + +## Configuring Log Streaming to GCS + +1. Navigate to the [**Logs**](/cloud/features/collaboration-and-communication/audit_logs/overview) page: + - Click on your **account name** in the top-right corner of the UI + - Open the dropdown menu + - Select **Logs** + +2. In the **External Integrations** section, click the **Connect** button + +3. In the modal that opens, select **Google Cloud Storage (GCS)** as your log streaming destination + +4. Enter your GCS configuration: + - **Bucket Path**: The full GCS bucket path (e.g., `gs://my-logs-bucket`) + - **Service Account Key File**: Upload your Google Cloud service account JSON key file + - To generate a service account key file: + 1. Go to [Google Cloud Console > IAM & Admin > Service Accounts](https://console.cloud.google.com/iam-admin/serviceaccounts) + 2. Select your service account (or create a new one) + 3. Click the three dots menu and select "Manage keys" + 4. Click "ADD KEY" and select "Create new key" + 5. Choose "JSON" format and click "CREATE" + 6. The JSON file will be downloaded automatically + +5. Click **Save** to enable log streaming + + +
+ +
+ + + +The log streaming configuration applies to your entire workspace. Both user activity logs and system logs will be streamed to your GCS bucket in batches. + + +## Log Batching + +Logs are automatically batched and written to GCS files based on the following criteria: + +- **Time-based batching**: A new file is created every **15 minutes** +- **Size-based batching**: A new file is created when the batch reaches **100MB** + +**Whichever condition is met first** triggers a new file to be created. This ensures efficient storage while maintaining reasonable file sizes for processing. + +## File Path Format + +Logs are stored at the root of your bucket using a Hive-based partitioning structure for efficient querying and organization: + +``` +log_type={log_type}/date={YYYY-MM-DD}/hour={HH}/file_{timestamp}_{batch_id}.ndjson +``` + +Where: +- `{log_type}`: Either `audit` (for user activity logs) or `system` (for system logs) +- `{YYYY-MM-DD}`: Date in ISO format (e.g., `2024-01-15`) +- `{HH}`: Hour in 24-hour format (e.g., `14`) +- `{timestamp}`: Unix timestamp when the file was created +- `{batch_id}`: Unique identifier for the batch + +### Example File Paths + +``` +log_type=audit/date=2024-01-15/hour=14/file_1705320000_batch_abc123.ndjson +log_type=system/date=2024-01-15/hour=14/file_1705320900_batch_def456.ndjson +``` + +This Hive-based structure allows you to: +- Efficiently query logs by date and hour using BigQuery or other tools +- Filter logs by type (`audit` or `system`) +- Process logs in parallel by partition + + +
+ +
+ + +## Log Format + +Logs are stored as **line-delimited JSON** (NDJSON), where each line represents a single log entry as a JSON object. + +### User Activity Logs + +Each user activity log entry includes: + +```json +{ + "timestamp": "2024-01-15T14:30:45.123456Z", + "log_type": "audit", + "event_name": "user_login", + "success": true, + "user_email": "john.doe@example.com", + "user_name": "John Doe", + "env_id": "env_7890123456abcdef", + "env_name": "Production", + "event_content": { + "additional": "context" + } +} +``` + +### System Logs + +Each system log entry includes: + +```json +{ + "timestamp": "2024-01-15T14:30:45.123456Z", + "log_type": "system", + "event_name": "dbt_data_sync_completed", + "success": true, + "env_id": "env_7890123456abcdef", + "env_name": "Production", + "event_content": { + "environment_id": "env_789", + "environment_name": "Production" + } +} +``` + +### Field Descriptions + +- `timestamp`: ISO 8601 timestamp of the event (UTC) +- `log_type`: Either `"audit"` for user activity logs or `"system"` for system logs +- `event_name`: The specific action that was performed (e.g., `user_login`, `create_test`, `dbt_data_sync_completed`) +- `success`: Boolean indicating whether the action completed successfully +- `user_email`: User email address (only present in audit logs) +- `user_name`: User display name (only present in audit logs) +- `env_id`: Environment identifier (empty string for account-level actions) +- `env_name`: Environment name (empty string for account-level actions) +- `event_content`: Additional context-specific information as a JSON object + + +
+ +
+ + +## Disabling Log Streaming + +To disable log streaming to GCS: + +1. Navigate to the [**Logs**](/cloud/features/collaboration-and-communication/audit_logs/overview) page +2. In the **External Integrations** section, find your GCS integration +3. Click **Disable** or remove the GCS configuration +4. Confirm the action + + +Disabling log streaming will stop sending new logs to GCS immediately. Historical logs already written to GCS will remain in your bucket. + diff --git a/docs/cloud/integrations/log-streaming/splunk.mdx b/docs/cloud/integrations/log-streaming/splunk.mdx new file mode 100644 index 000000000..1c353c6ea --- /dev/null +++ b/docs/cloud/integrations/log-streaming/splunk.mdx @@ -0,0 +1,176 @@ +--- +title: "Splunk" +--- + +Elementary's Splunk integration enables streaming audit logs and system logs directly to your Splunk instance via HTTP Event Collector (HEC) for centralized log management, monitoring, and analysis. + +## Overview + +When enabled, Elementary automatically streams your workspace's [audit logs](/cloud/features/collaboration-and-communication/audit_logs/overview) ([user activity logs](/cloud/features/collaboration-and-communication/audit_logs/user-activity-logs) and [system logs](/cloud/features/collaboration-and-communication/audit_logs/system-logs)) to Splunk using the [Splunk HTTP Event Collector (HEC)](https://docs.splunk.com/Documentation/Splunk/latest/Data/HECExamples). This allows you to: + +- Centralize all logs in your Splunk instance +- Set up custom alerts and dashboards on log events +- Correlate Elementary logs with other application logs +- Perform advanced log analysis and search using Splunk's powerful query language +- Maintain long-term log retention in Splunk + +## Prerequisites + +Before configuring log streaming to Splunk, you'll need: + +1. **Splunk Instance** - A Splunk Enterprise or Splunk Cloud instance with HTTP Event Collector (HEC) enabled + - HEC must be configured and accessible from Elementary's servers + - You'll need the HEC URL (e.g., `https://splunk.example.com:8088`) + +2. **HEC Token** - An HTTP Event Collector token for authentication + - You can create a token in Splunk under **Settings > Data Inputs > HTTP Event Collector** + - The token must have write permissions + +3. **Splunk Index** (optional) - A specific index where logs should be stored + - If not specified, logs will be sent to the default index configured for the HEC token + +## Configuring Log Streaming to Splunk + +1. Navigate to the [**Logs**](/cloud/features/collaboration-and-communication/audit_logs/overview) page: + - Click on your **account name** in the top-right corner of the UI + - Open the dropdown menu + - Select **Logs** + +2. In the **External Integrations** section, click the **Connect** button + +3. In the modal that opens, select **Splunk** as your log streaming destination + +4. Enter your Splunk configuration: + - **HEC URL**: Your Splunk HTTP Event Collector URL (e.g., `https://splunk.example.com:8088` or `https://example.splunkcloud.com:8088`) + - **HEC Token**: Your Splunk HEC authentication token + - **Index** (optional): The Splunk index where logs should be stored (defaults to the token's configured index if not specified) + +5. Click **Save** to enable log streaming + + +
+ +
+ + + +The log streaming configuration applies to your entire workspace. Both user activity logs and system logs will be streamed to Splunk in batches. + + +## Log Format in Splunk + +Logs are sent to Splunk with the following structure: + +### Event Structure + +Each log entry is sent as a JSON event with the following fields: + +- `event`: The log data as a JSON object +- `sourcetype`: `_json` (indicating JSON format) +- `source`: `elementary-cloud` (identifying the source) +- `time`: Unix timestamp of the event +- `index`: The Splunk index (if specified in configuration) + +### User Activity Logs + +Each user activity log entry includes: + +```json +{ + "timestamp": "2024-01-15T14:30:45.123456Z", + "log_type": "audit", + "event_name": "user_login", + "success": true, + "user_email": "john.doe@example.com", + "user_name": "John Doe", + "env_id": "env_7890123456abcdef", + "env_name": "Production", + "event_content": { + "additional": "context" + } +} +``` + +### System Logs + +Each system log entry includes: + +```json +{ + "timestamp": "2024-01-15T14:30:45.123456Z", + "log_type": "system", + "event_name": "dbt_data_sync_completed", + "success": true, + "env_id": "env_7890123456abcdef", + "env_name": "Production", + "event_content": { + "environment_id": "env_789", + "environment_name": "Production" + } +} +``` + +### Field Descriptions + +- `timestamp`: ISO 8601 timestamp of the event (UTC) +- `log_type`: Either `"audit"` for user activity logs or `"system"` for system logs +- `event_name`: The specific action that was performed (e.g., `user_login`, `create_test`, `dbt_data_sync_completed`) +- `success`: Boolean indicating whether the action completed successfully +- `user_email`: User email address +- `user_name`: User display name +- `env_id`: Environment identifier (empty for account-level actions) +- `env_name`: Environment name (empty for account-level actions) +- `event_content`: Additional context-specific information as a JSON object + + +
+ +
+ + +## Viewing Logs in Splunk + +Once configured, logs will appear in your Splunk instance within a few seconds of being generated. + +You can search logs using Splunk Search Processing Language (SPL): + +``` +# Search for all Elementary logs +source="elementary-cloud" + +# Filter by log type +source="elementary-cloud" log_type="audit" +source="elementary-cloud" log_type="system" + +# Search for specific actions +source="elementary-cloud" event_name="user_login" +source="elementary-cloud" event_name="dbt_data_sync_completed" + +# Filter by environment +source="elementary-cloud" env_name="Production" + +# Search for failed operations +source="elementary-cloud" success=false + +# Search by user email +source="elementary-cloud" user_email="john.doe@example.com" +``` + +## Disabling Log Streaming + +To disable log streaming to Splunk: + +1. Navigate to the [**Logs**](/cloud/features/collaboration-and-communication/audit_logs/overview) page +2. In the **External Integrations** section, find your Splunk integration +3. Click **Disable** or remove the Splunk configuration +4. Confirm the action + + +Disabling log streaming will stop sending new logs to Splunk immediately. Historical logs already sent to Splunk will remain in your Splunk instance according to your retention settings. + diff --git a/docs/cloud/integrations/metadata-layer/glue.mdx b/docs/cloud/integrations/metadata-layer/glue.mdx new file mode 100644 index 000000000..ff037da81 --- /dev/null +++ b/docs/cloud/integrations/metadata-layer/glue.mdx @@ -0,0 +1,106 @@ +--- +title: "AWS Glue" +--- + +The AWS Glue integration in Elementary will allow you to automatically monitor volume & freshness anomalies in your Iceberg tables in Glue, by continouously syncing metadata about Iceberg snapshots. + + + This integration is currently only supported with the Dremio engine. We plan to add other engines in the near future. + + +## AWS Setup + +### 1. Create Required IAM Policy + +First, you'll need to create an IAM policy with the following permissions: +- **GluePermissions**: Enables reading metadata about tables in your Glue catalog. +- **S3IcebergMetadataReadAccess**: Grants access metadata-only access to files of your Iceberg tables. These metadata files contain statistics about Iceberg snapshots, +such as update cadence and row count changes. + +Here is an example of a JSON policy: +```json +{ + "Version": "2012-10-17", + "Statement": [ + { + "Sid": "GluePermissions", + "Effect": "Allow", + "Action": [ + "glue:GetTable", + "glue:GetTables" + ], + "Resource": "*" + }, + { + "Sid": "S3IcebergMetadataReadAccess", + "Effect": "Allow", + "Action": [ + "s3:GetObject" + ], + "Resource": [ + "arn:aws:s3:::your-iceberg-tables-bucket/*metadata.json" + ] + } + ] +} +``` + +### 2. Choose Authentication Method + +Elementary supports two authentication methods for connecting to Glue: + +#### Option 1: AWS Role Authentication (Recommended) + +This is the recommended approach as it provides better security and follows AWS best practices. [Learn more about AWS IAM roles](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html). + +1. **Create an IAM Role**: + - Go to AWS IAM Console + - Create a new role + - Select "Another AWS account" as the trusted entity + - Enter Elementary's AWS account ID: `743289191656` + - (Optional but recommended) Enable "Require external ID" and set a value + - Attach the policy created in step 1 + +2. **Note down the following information**: + - Role ARN + - External ID (if you enabled it) [Learn more about external IDs](https://aws.amazon.com/blogs/security/how-to-use-external-id-when-granting-access-to-your-aws-resources/). + +#### Option 2: Access Key Authentication + +This method is less secure as it requires permanent credentials. We recommend using AWS Role authentication instead. + +1. **Create an IAM User**: + - Go to AWS IAM Console + - Create a new user, that will be used by elementary to connect to Glue + - Enable programmatic access + - Attach the policy created in step 1 + +2. **Note down the following information**: + - AWS Access Key ID of the new elementary glue user + - AWS Secret Access Key of the new elementary glue user + +## Elementary Configuration + +Navigate to the **Account settings > Environments** and choose the environment to which you would like to connect AWS Glue. +Under the "Metadata Layer" section, please choose Glue. + +### Connection Settings + +Regardless of the authentication method you choose, you'll need to provide: + +- **Connection Name**: A descriptive name for your connection (e.g. "Datalake"). Needs to be unique if you're adding more than one metadata integration. +- **Region**: The AWS region where your Glue catalog is located + +### Authentication Details + +Based on your chosen authentication method: + +#### If using AWS Role Authentication: +- Select "AWS Role" as the authentication method +- Enter your role ARN +- Enter your external ID (if you enabled it) + +#### If using Access Key Authentication: +- Select "Access Key" as the authentication method +- Enter your AWS Access Key ID +- Enter your AWS Secret Access Key diff --git a/docs/cloud/integrations/security-and-connectivity/aws-privatelink-integration.mdx b/docs/cloud/integrations/security-and-connectivity/aws-privatelink-integration.mdx new file mode 100644 index 000000000..424baf2e9 --- /dev/null +++ b/docs/cloud/integrations/security-and-connectivity/aws-privatelink-integration.mdx @@ -0,0 +1,300 @@ +--- +title: "AWS PrivateLink" +sidebarTitle: "AWS PrivateLink" +--- + +## What is AWS PrivateLink? + +**AWS PrivateLink** is a secure and scalable networking technology that enables private connectivity between Virtual Private Clouds (VPCs), AWS services, and on-premises applications—without exposing traffic to the public internet. By leveraging PrivateLink, organizations can simplify their network architecture, reduce data exposure risks, and ensure secure communication between services. + +With PrivateLink, services are exposed as **private endpoints** within a VPC, allowing consumers to connect to them using private IP addresses. This minimizes the need for complex networking configurations like VPC peering or VPNs, and reduces the risk of data leakage by keeping traffic within the AWS network. + +In the context of our integration, AWS PrivateLink enables Elementary Cloud to securely and privately communicate with supported services, ensuring data privacy, compliance, and a streamlined user experience. We support cross-region PrivateLink and can connect to any region where your cloud is hosted, using VPC peering to link different regions to our production environment securely. Elementary Data maintains a global network of regional VPCs designed for PrivateLink, with strict security controls. + +## Architecture overview + + + +Elementary’s PrivateLink setup consists generally from two parts: + +1. **AWS PrivateLink connection** - + 1. Provider side (Customer / 3rd party) - **A VPC endpoint service** is set up at the customer’s AWS account (or a 3rd party AWS account in the case of Snowflake). This provides access to a particular service in that account. + 2. Consumer side (Elementary) - Elementary sets up a dedicated VPC interface that will connect to the integrated service, in the same AWS region as the service. + This is done through a dedicated regional VPC created for this purpose. +2. **AWS VPC Peering:** + 1. Elementary’s production servers are located in the **eu-central-1** (Frankfurt) region. For us to be able to access the service exposed through PrivateLink, we connect our main production VPC with the regional VPC mentioned above. + +## Supported integrations + +### Snowflake + +Snowflake has support for connecting to AWS-hosted Snowflake accounts via PrivateLink. This setup is entirely managed by Snowflake, so Elementary connects with an endpoint service hosted on Snowflake’s AWS account for this purpose. + +In order to set up a PrivateLink connection with Snowflake, please follow the steps below: + +1. **Open a support case to Snowflake Support** + 1. Ask to authorize Elementary’s AWS account for PrivateLink access. + 2. Provide Elementary’s account ID in the request - `743289191656` +2. **Obtain the PrivateLink configuration** + 1. Once Snowflake’s support team approves the request, obtain the PrivateLink configuration by invoking the following commands (admin access is required): + + ```sql + USE ROLE ACCOUNTADMIN; + SELECT SYSTEM$GET_PRIVATELINK_CONFIG(); + ``` + +3. **Provide Elementary with the configuration obtained in the previous step.** + 1. Elementary will then setup the required infrastructure to connect to Snowflake via PrivateLink. +4. **Add a Snowflake environment in Elementary** + 1. Follow the instructions [here](/cloud/integrations/dwh/snowflake) to set up a Snowflake environment in Elementary. + 1. When supplying the account, use `.privatelink` , where the account identifier is the result of the following query: + + ```sql + SELECT CURRENT_ORGANIZATION_NAME() || '-' || CURRENT_ACCOUNT_NAME(); + ``` + + 2. In the Snowflake instructions, under the *Add the Elementary IP to allowlist* section, please add the following private subnets *instead* of the IP mentioned there: + * 10.0.1.x + * 10.0.2.x + * 10.0.3.x + +### Databricks + +Databricks has support for connecting to AWS-hosted Databricks workspaces via PrivateLink. This setup is entirely managed by Databricks, so Elementary connects with an endpoint service hosted on Databrick's AWS account for this purpose. + +**Note**: +1. You must be a Databricks account admin to perform this setup. +2. Your databricks workspace must be deployed on a customer-managed VPC. PrivateLink is not supported with Databricks-managed VPCs. + +In order to set up a PrivateLink connection with Databricks, please follow the steps below: + +1. **Please provide Elementary with the following details:** + * Your databricks workspace URL. + * Your AWS account ID. + * Your AWS region. + + Elementary will then provide you with a VPC Endpoint ID that will be used in the next step. + +2. **Register your VPC endpoint** + + In the account management portal (not your workspace), go to Security -> Networking -> VPC Endpoints, +then click on the "Register VPC Endpoint" button. + + You should fill in: + 1. A name for the VPC endpoint - e.g. "Elementary". + 2. Your AWS region. + 3. The VPC Endpoint ID provided to you by Elementary. + + Register VPC Endpoint + +3. **Configure a private access setting** + + Go to Security -> Private Access Settings. + + * If you've set up private link with your Databricks instance before, you should already have a private access setting configured. +In that case, please ensure that the endpoint allows access to the VPC endpoint created in step (2). + + * If this is the first time you are setting PrivateLink for your databricks workspace: + * Click on "Add private access config". + * Please fill in the following details: + * A name for your setting: e.g. "Privatelink settings" + * Your AWS region + * Whether or not to allow public access - only set this as False if all your systems and users access your Databricks workspace through privatelink. + * Private access level - either leave as "Account", or allow-list specific VPCs including the Elementary VPC created in the previous step. + + Configure Private Access + +4. **Add the private access setting to your Databricks workspace** + + __Note__: If you have already set up Privatelink with Databricks in the past, you can skip this step. + + Under the Databricks account management portal, go to Workspaces, click on your workspace and then on "Update Workspace". +Then go to "Advanced Configurations", and under "Private Link", please attach the setting created in the previous step. + +5. **Add a Databricks environment in Elementary** + + After all the previous steps are completed, please reach out to the Elementary team to verify that your Databricks cluster is accessible via PrivateLink. + + Once verified, please add a Databricks environment to Elementary by following [this guide](/cloud/integrations/dwh/databricks). + + Under the *Add the Elementary IP to allowlist* section, please add the following private subnets *instead* of the IP mentioned there: + * 10.0.1.x + * 10.0.2.x + * 10.0.3.x + + +### Github Enterprise Server + +Github Enterprise Server can be connected to Elementary Cloud via AWS PrivateLink. This setup requires creating a VPC endpoint service in your AWS account that exposes your GitHub Enterprise Server instance. + +**Prerequisites:** +- Your Github Enterprise Server instance must be accessible from within your AWS VPC +- You must have administrative access to your AWS account +- Ensure you are working in the correct AWS region where your Github Enterprise Server is deployed + +In order to set up a PrivateLink connection with Github Enterprise Server, please follow the steps below: + +1. **Create a VPC Endpoint Service** + - Follow the detailed instructions in the [Creating a VPC Endpoint Service](#creating-a-vpc-endpoint-service) section below to set up the endpoint service for your GitHub Enterprise Server instance. + +2. **Add a Github integration in Elementary** - Once the VPC endpoint service setup is completed, please proceed to adding a [GitHub integration](https://docs.elementary-data.com/cloud/integrations/code-repo/github). Note: + * OAuth is not currently supported for Github Enterprise Server, so you should generate a fine-grained token. + * You should use the same Github hostname as you would internally (Elementary will resolve that host to the privatelink endpoint) + +## Creating a VPC Endpoint Service + + + The setup below is only required for services that are hosted in your own VPC (e.g. Github Enterprise Server). It is not required for Snowflake or Databricks which manage themselves the server-side PrivateLink setup. + + Each integration above explicitly states if it requires setting up a VPC endpoint service. + + +In order to expose services that are hosted within your own VPC, it is required to create a VPC endpoint service. This is essentially the server-side component of PrivateLink, and the destination Elementary's VPC endpoint will connect to. + +> **Note:** You should create the VPC endpoint service in the same region your service is located in. + +You can follow the steps below to create an endpoint service via the AWS console. + +### Create a network load balancer to your service + +Before setting up the VPC endpoint service, it is required to set up an internal network load balancer pointing to the service you are exposing. +Setting up a load balancer consists of the following sub-steps: + +**1. Create a target group** + +Under the EC2 page in AWS, navigate to the Target Groups menu and click "Create target group". Please follow the wizard and fill the following details: + +1. **Target type** - You should select this based on the method you used to deploy your service - e.g. based on Instance or IP address. +2. **Target group name** - Choose a name for the target group +3. **Protocol : Port** - Choose this based on the service you are exposing (normally should be HTTP / HTTPS) +4. **IP address type** - IPv4. +5. **VPC** - The VPC your service is deployed in. + +Create target group +Create target group + +**2. Create a Network Load Balancer (NLB)** + +Under the EC2 page in AWS, navigate to "Load Balancers" and click on "Create load balancer". Choose **Network Load Balancer** and proceed with the creation. Please follow the wizard and fill the following details: +1. **Load balancer name** - Choose a name for the load balanver (e.g. "github-lb") +2. **Scheme** - Internal. +3. **Load balancer IP address type** - IPv4. +4. **VPC** - Choose same VPC as the one you used for the target group above. +5. **Mappings** - Select one or more private subnets. +6. **Security groups** - Select a security group with access to your service. Please grant access to the relevant ports for your service, to the following IP ranges (these are the internal IPs Elementary may connect to your service from): + * 10.0.1.x + * 10.0.2.x + * 10.0.3.x +7. **Listeners** - Select the target group, protocol, and port from Step 1. + +Create Network Load Balancer +Create Network Load Balancer +Create Network Load Balancer + +**3. Verify the Target Group is Healthy** + +Once the load balancer from the previous step is ready, please navigate to the target group you created above. It should be listed as "Healthy". + +Ensure target group is healthy + +If it appears as "Unhealthy" for some reason, please ensure the security group provides access to the service, that the health check is configured correctly, and of course that the service itself is available. + +**4. Enable Cross-Zone Load Balancing** + +If you selected more than one subnet for the load balancer above: +* Navigate back to the "Load Balancer" screen +* Choose the load balancer you created. +* Click on Actions -> Edit load balancer attributes. +* Enable the setting "Enable cross-zone load balancing". + +If you selected more than one subnet (availability zone) when creating the NLB in Step 2, navigate back to the Load Balancer, then select Actions → Edit load balancer attributes. From this page, select "Enable cross-zone load balancing" and save your changes. + +### Create a VPC Endpoint Service and approve access for Elementary + +**1. Create a VPC Endpoint Service** + +Navigate to the VPC page in AWS, go to "Endpoint services," and select "Create endpoint service." Please follow the wizard and fill in the following details: + +1. **Name** - Choose a name for your VPC endpoint service. +2. **Load balancer type** - Network. +3. **Available load balancers** - Select the network load balancer (NLB) created above. +4. **Require acceptance for endpoint** - Yes (so new connections will require approval, see below) +5. **Enable private DNS name** - No. +6. **Supported IP address types** - IPv4. + +Create endpoint service +Create endpoint service + +Once the service is created, please go to the "Details" tab and save the "Service name" attribute, you will need later to provide it to the Elementary team. + +**2. Allow the Elementary Principal** + +Once the VPC endpoint service is successfully created, navigate to the "Allow principals tab" and click on "Allow Principals". Then add the following principal: +``` +arn:aws:iam::743289191656:root +``` + +After the endpoint service finishes creating, navigate to the "Allow principals" section and select "Allow principals." Add Elementary's AWS account ID: `743289191656`. + +Allow elementary principal + +**3. Contact the Elementary team to configure the PrivateLink connection** + +Please provide the Elementary team with the following details: +1. Your AWS account ID. +2. Your AWS region. +3. The VPC endpoint service name (from step 1). +4. The relevant service / integration (e.g. Github). +5. The hostname you use internally to connect to your service. + +**4. Accept the endpoint connection request** + +Once you got confirmation from the Elementary team that the private link connection is set up, you need to approve +the VPC endpoint connection from Elementary. You can do so with the following steps: +* In the VPC page in your AWS console, go to "Endpoint Services", and then choose the endpoint service that you created in step 1. +* Under the "Endpoint Connections" tab, you should see a pending connection, select it. +* Click on Actions->Accept endpoint connection request to accept the connection. + +After a couple of minutes the connection should change from "Pending" to "Available". + +Accept endpoint connection request + +**5. Notify the elementary team** + +Once the endpoint connection is approved and shows as "Available", please reach out to the Elementary team, so we will ensure the connection is ready and working. diff --git a/docs/cloud/integrations/security-and-connectivity/ms-entra.mdx b/docs/cloud/integrations/security-and-connectivity/ms-entra.mdx new file mode 100644 index 000000000..121631cf4 --- /dev/null +++ b/docs/cloud/integrations/security-and-connectivity/ms-entra.mdx @@ -0,0 +1,82 @@ +--- +title: "Microsoft Entra ID" +sidebarTitle: "Microsoft Entra ID" +--- + +## Enabling SAML + +In order to enable SAML using Microsoft Entra ID (Previously Azure AD SSO), we need the following steps to be taken: + + + - Go to the [Microsoft Entra portal](https://entra.microsoft.com/) + - On the left, choose Applications → Enterprise Applications + + + + - Click on “New Application” + + + + - Click on “Create your own application” + + + + - Choose the last option in the side-window that opens and click “Create” + + + + - In the App window that opens, click on “Single Sign-On” + + + + - Choose SAML + + + + - Click on Edit on the “Basic SAML Configuration” section + + + + - Fill the following entries: + - Identifier (Entity ID) - `elementary` + - Reply URL - [`https://elementary-data.frontegg.com/auth/saml/callback`](https://elementary-data.frontegg.com/auth/saml/callback) + - Download the Federation Metadata XML. + + + + + + - Go to your account settings page in Elementary (Your avatar in the top right corner -> Account -> Settings) + - In the SSO section, click on "Configure connection" + + - Fill in the form with the following details: + - SAML Metadata: Choose "Upload file", upload the Federation Metadata XML you downloaded. + - Domains: Add the domains you want to allow access to Elementary. + - Click on "Save" to save the configuration. + + + Make sure to verify that login works in an incognito window or with another user before logging out. + If it does not, disable the SSO configuration immediately and contact the Elementary team. + + + +## Provisioning + +Elementary supports user provisioning via SCIM to automate user management. If you want to enable automatic provisioning, follow these steps: + + - Go to your account settings page in Elementary (Your avatar in the top right corner -> Account -> Settings) + - In the Provisioning section, click on "Configure" + - Choose "Azure AD", and click "Create" to create a new URL and token for provisioning + - DO NOT close this dialog until you have configured SCIM in Azure AD + + + + - In the **Microsoft Entra portal**, go to **Enterprise Applications** and select the newly created SAML application. + - Navigate to **Provisioning** and click **Get Started**. + - Set the **Provisioning Mode** to **Automatic**. + - Configure the **Tenant URL** and **Secret Token** - _value from Elementary Provisioning section_ + - Click **Test Connection** to validate the setup. + - Enable provisioning and save changes. + + +This setup ensures that users are automatically created, updated, and deactivated in Elementary based on their status in Microsoft Entra ID. You can always reach out if you need any help. diff --git a/docs/cloud/integrations/security-and-connectivity/okta.mdx b/docs/cloud/integrations/security-and-connectivity/okta.mdx new file mode 100644 index 000000000..6f6a79bcf --- /dev/null +++ b/docs/cloud/integrations/security-and-connectivity/okta.mdx @@ -0,0 +1,145 @@ +--- +title: "Okta" +sidebarTitle: "Okta" +--- + +## Authentication & SSO Integration + +### Supported Authentication Protocols + +Elementary Cloud supports **Okta Single Sign-On (SSO)** via multiple authentication protocols: + +- **SAML 2.0** (Security Assertion Markup Language) +- **OIDC (OpenID Connect)** + +These protocols enable seamless authentication, reducing the need for manual credential management. + +### SCIM for Automated Provisioning + +Elementary Cloud supports **SCIM (System for Cross-domain Identity Management)** for automated user provisioning and deprovisioning: + +- **Automated User Creation**: Users added in Okta can be provisioned automatically in Elementary Cloud. +- **Deprovisioning Support**: When a user is removed from Okta, their access to Elementary Cloud is revoked automatically. +- **Group-Based Provisioning**: Okta groups can be mapped to roles in Elementary Cloud by the Elementary team. + +For more details on SCIM setup, refer to Okta’s SCIM integration guide: [Okta SCIM Guide](https://help.okta.com/en-us/content/topics/apps/apps_app_integration_wizard_scim.htm). + +## Security & Access Control + +### Multi-Factor Authentication (MFA) + +Elementary Cloud does not enforce MFA directly, but any MFA policies configured through Okta will automatically apply once Okta SSO is enabled. + +### Role-Based Access Control (RBAC) and Group Sync + +- Supports **RBAC with predefined roles** (**Admin, Can Write, Can Read**). +- **Role mappings for group names** can be pre-defined if sent in advance. +- **Role Assignment**: + - The account creator will have a default **Admin** role. + - For provisioned users, If no configuration is made, the **default role will be Can Read**. + - Manually invited users will have the role defined during the invite process. + - **Custom roles** are currently not supported. + + + +## How to Set Up Okta SSO for Elementary Cloud +Please follow the steps below to configure an SSO connection in Elementary. + + + - Go to Applications → Applications, and click on **Create App Integration** + + Choose SAML 2.0 as the sign-in method: + + - Name the app “Elementary Data” and click “Next” + + - Under **SAML Settings**, please enter the following details: + - Single sign-on URL: **https://elementary-data.frontegg.com/auth/saml/callback** + - Audience URI (SP Entity ID): elementary + - Name ID Format: EmailAddress + - Application Username: Email + - Update application username on: Create and update + + - Click “Finish” on the next screen, and the app will be created! + - Now, let’s configure users / groups that have access to the app. To do so, please go to the “Assignments” tab and add relevant groups / users. + - Note - + - It is recommended to set up an “Elementary Users” group dedicated for this purpose, though you can also add access for individual users. + - The setting below is for **assignment** of users to your app, e.g. users that are permitted to login to Elementary via Okta. This does not cover actually provisioning the user in Elementary (this is covered in the next section). + + - Go to the “Sign On” tab, and copy the link under Metadata URL: + + + + + - Go to your account settings page in Elementary (Your avatar in the top right corner -> Account -> Settings) + - In the SSO section, click on "Configure connection" + + - Fill in the form with the following details: + - SAML Metadata: Choose "URL", paste the link you copied from Okta into the URL field, and click "Fetch". + - Domains: Add the domains you want to allow access to Elementary. + - Click on "Save" to save the configuration. + + + Make sure to verify that login works in an incognito window or with another user before logging out. + If it does not, disable the SSO configuration immediately and contact the Elementary team. + + + +## How to Set Up SCIM for Automated Provisioning + This section covers how to automatically provision users and groups from Okta in Elementary. If you prefer, it is also possible to set up the SSO part without provisioning. In that case, users can be invited to the platform via the Team page in Elementary. + +Please follow the steps below to configure SCIM provisioning within Elementary: + + + - Go to your account settings page in Elementary (Your avatar in the top right corner -> Account -> Settings) + - In the Provisioning section, click on "Configure" + - Choose "Okta", and click "Create" to create a new URL and token for provisioning + - DO NOT close this dialog until you have configured SCIM in Okta + + + + - Under the **Elementary Data** app, go to the **General** tab, and click **Edit**. Then modify the **Provisioning** setting to **SCIM** and click **Save**. + + A new Provisioning tab should appear, click it and then click Edit. + + - Please fill the following details: + - **SCIM connector base URL** - _value from Elementary Provisioning section_ + - **Unique identifier field for users** - email + - **Supported provisioning actions** - mark all the “Push” settings (New users, Profile updates and Groups). + - **Authentication Mode -** HTTP Header + - **Authorization** - _access token from Elementary Provisioning section_ + + When you are done, click on **Test Connector Configuration** + + Ensure that all the marked provisioning actions were successful: + + - Click **Save** to update the provisioning configuration. + - Click the **To App** section on the left and click **Edit**: + + - Please enable the settings: + - Create Users + - Update User Attributes + - Deactivate Users + + And click **Save.** + + If you already created an “Elementary Users” group under the Assignments tab in the previous section, you may want to remove and re-add it to ensure all the users there are created successfully in Elementary. + + +## **Pushing groups to Elementary** + +As a part of the provisioning setup for Elementary, you can also choose to provision **user groups** to control permissions within Okta. + +These can be mapped to roles within Elementary (such as **Can Edit** or **Admin**). + +To do so, under the Elementary Data app, do the following + +- Click on **Push Groups** + +- Add the groups you would like to push: + +- If the **Push Status** appears as **Active** - it means the groups were successfully pushed to Elementary. +- Please ask the Elementary team to map the groups you pushed to roles within Elementary. In this case the mapping is clear: + - **Elementary Admins** - Admin. + - **Elementary Editors** - Can Edit. +- Once this is done, you should be able to see in the **Team** page in Elementary all the users and their correct roles. + diff --git a/docs/cloud/integrations/transformation-and-orchestration/dbt-fusion.mdx b/docs/cloud/integrations/transformation-and-orchestration/dbt-fusion.mdx new file mode 100644 index 000000000..b1ec55e68 --- /dev/null +++ b/docs/cloud/integrations/transformation-and-orchestration/dbt-fusion.mdx @@ -0,0 +1,7 @@ +--- +title: "dbt Fusion (Beta)" +--- + +import DbtFusion from '/snippets/integrations/dbt-fusion.mdx'; + + \ No newline at end of file diff --git a/docs/cloud/introduction.mdx b/docs/cloud/introduction.mdx index e593ff0b6..945454190 100644 --- a/docs/cloud/introduction.mdx +++ b/docs/cloud/introduction.mdx @@ -6,39 +6,30 @@ icon: "cloud" **Elementary is a data observability platform tailored for dbt-first data organizations.** -The unique dbt-native architecture seamlessly integrates into engineers' workflows, ensuring ease of use and smooth adoption. -The platform provides out-of-the-box monitoring for critical issues, tools to effortlessly increase coverage, and integrations for end-to-end visibility across the data stack. +Elementary Cloud is a fully managed, AI-powered platform for organization-wide reliability. It includes smart alerting, incident management, column-level lineage, anomaly detection, AI agents, and enterprise features like RBAC and audit logs. It’s designed for teams that want to scale trust, automation, and collaboration across engineering and business. -Elementary promotes ownership and collaboration on incidents, and enables the whole data organization to take an active role in the data quality process. -By automatically measuring and tracking data health, it helps teams transition from reactive firefighting to proactively communicating data health to consumers and stakeholders. + + + + + + + + - - + + + + + -## Cloud Platform Features +## Get started with Elementary Cloud - + -## Architecture and Security - - - -Our product is designed with [Security and Privacy](/cloud/general/security-and-privacy) in mind. - - -**SOC 2 certification:** Elementary is SOC2 type II certified! - - - -## How to Start? - - - - \ No newline at end of file + + + + + diff --git a/docs/cloud/main_introduction.mdx b/docs/cloud/main_introduction.mdx new file mode 100644 index 000000000..68d5499d1 --- /dev/null +++ b/docs/cloud/main_introduction.mdx @@ -0,0 +1,60 @@ +--- +title: "Welcome to Elementary" +sidebarTitle: "Elementary" +description: "dbt-native data observability platform built for data and analytics engineers." +icon: "fire" +--- + +import QuickstartCards from '/snippets/quickstart/quickstart-cards.mdx'; +import QuickstartSteps from '/snippets/cloud/quickstart-steps.mdx'; + + + +
+ Elementary banner +
+ +Elementary includes two products: + + + +See the [detailed comparison](/cloud/cloud-vs-oss) between Elementary Cloud and Elementary OSS. + +## Get started with Elementary + + + +### Elementary OSS + +Or - Start with [Elementary OSS](/oss/oss-introduction), open-source CLI tool you can deploy and orchestrate to send Slack alerts and self-host the Elementary report. + +## Why choose Elementary? + + + + Elementary configuration is managed in your dbt code. + Elementary Cloud syncs configuration changes from the UI back to the dbt project code repository. + + You won't need to duplicate configuration - all your existing tests, owners, tags, and descriptions are leveraged. + + + + Elementary Cloud can't access your data. + + The dbt package creates a schema for logs, results and metadata, and Elementary only requires access to the Elementary schema. + + [Read about Security and Privacy >>>](/cloud/general/security-and-privacy) + + + + Elementary dbt package automatically collects results and artifacts from your dbt project. All of your Elementary configuration is managed in your dbt code. + + By combining the package and Elementary Cloud, you get full dbt observability. All your tests and results in one dashboard and interface. + + + +## Want to know more? + + + + diff --git a/docs/cloud/manage-team.mdx b/docs/cloud/manage-team.mdx index 5908799e6..082697f6e 100644 --- a/docs/cloud/manage-team.mdx +++ b/docs/cloud/manage-team.mdx @@ -1,7 +1,7 @@ --- title: "Invite users" sidebarTitle: "Invite users" -icon: "square-4" +icon: "square-3" --- ### Invite users @@ -9,6 +9,12 @@ icon: "square-4" You can invite team members to join you! 🎉 Click on your initials on the top right of the screen and select `Team` to invite users. +When you invite a user, you can assign them a role. The roles are: +1. "Admin" - Has full access to all assets including team management +2. "Can Edit" - can manage configurations, but cannot manage team members and environments. +3. "Can View" - can view data assets, test results, incidents and lineage + +You can also add custom roles with specific permissions to suit your needs. Talk to our support team to set this up. Users you invite will receive an Email saying you invited them, and will need to accept and activate their account. @@ -21,4 +27,4 @@ Users you invite will receive an Email saying you invited them, and will need to 1. [Connect Slack or MS Teams](/cloud/guides/enable-slack-alerts) for alerting 🔔 2. [Connect your code repository](/cloud/integrations/code-repo/connect-code-repo) to add tests configuration from the cloud 🔌 -3. [Connect your BI tool](/cloud/integrations/bi/connect-bi-tool) to automatically extend the lineage to dashboards 🚀 \ No newline at end of file +3. [Connect your BI tool](/cloud/integrations/bi/connect-bi-tool) to automatically extend the lineage to dashboards 🚀 diff --git a/docs/cloud/mcp/intro.mdx b/docs/cloud/mcp/intro.mdx new file mode 100644 index 000000000..722a30850 --- /dev/null +++ b/docs/cloud/mcp/intro.mdx @@ -0,0 +1,25 @@ +--- +title: "Elementary MCP Server" +sidebarTitle: "MCP Server" +icon: "message-code" +--- + + + + + + + + + + + + +The Elementary MCP Server allows you to connect your Elementary environment to any client that supports MCP (Model Context Protocol). This lets AI agents, copilots, or natural language interfaces query and act on your data stack using live context. + + diff --git a/docs/cloud/mcp/mcp-tools.mdx b/docs/cloud/mcp/mcp-tools.mdx new file mode 100644 index 000000000..c61dae64c --- /dev/null +++ b/docs/cloud/mcp/mcp-tools.mdx @@ -0,0 +1,33 @@ +--- +title: "MCP Tools" +sidebarTitle: "MCP tools" +--- +This is full list of tools currently exposed by the Elementary MCP Server. + +These tools allow agents and interfaces to retrieve information from your Elementary environment, such as models, tests, lineage, incidents, and coverage, in a structured and consistent way. + +You won’t need to use these tools directly. They’re used behind the scenes by any MCP-compatible client. + +| **Category** | **Tool Name** | **Purpose** | **Output** | +| --- | --- | --- | --- | +| **Discovery & Lineage** | `get_asset` | When you need info about any asset type, from source to BI (table assets, BI assets) | Complete asset details with general information | +| | `get_upstream_assets` | When investigating data lineage or debugging issues upstream | Asset dependencies with metadata (id, name, type, criticality, tags, owners, description, path) | +| | `get_downstream_assets` | When assessing impact of changes or understanding data consumers | Asset dependencies with metadata (id, name, type, criticality, tags, owners, description, path) | +| | `get_column_upstream_columns` | When tracing column-level lineage for specific data fields | Column dependencies with their metadata (id, name, type, table info) | +| | `get_column_downstream_columns` | When understanding column usage and impact analysis | Column dependencies with their metadata (id, name, type, table info) | +| | `get_table_asset` | When you need table-specific details like schema, columns, and tests | Table asset with columns, materialization, execution status, and associated tests | +| | `get_table_assets` | When searching for tables by name, tags, owners, or other criteria | List of table assets matching filter criteria | +| | `get_table_asset_compiled_query` | When you need to see the actual SQL code being executed | Compiled SQL query from the latest execution | +| | `get_table_asset_execution_history` | When investigating table execution patterns or failures over time | Historical execution records with timestamps and status | +| | `get_table_asset_execution` | When debugging a specific table execution failure | Detailed execution logs, timing, and error information | +| **Test & Coverage** | `get_assets_sorted_by_coverage` | When prioritizing which assets need more test coverage | Assets ranked by test coverage percentage | +| | `get_test` | When investigating test configuration or recent test results | Test details including config, execution status, and quality dimensions | +| | `get_tests` | When searching for tests by name, type, or asset association | List of tests matching filter criteria | +| | `get_test_execution_metrics` | When analyzing specific test failure metrics that caused anomaly | Test execution results, numerical metrics, and anomaly details | +| | `get_test_execution_history` | When reviewing test performance patterns over time | Historical test execution records and results | +| | `get_test_execution` | When debugging a specific test execution failure | Detailed test execution logs, results, and error information | +| | `get_tests_catalog` | When helping users add tests to their dbt project | Available test types that can be defined in the dbt project | +| **Incidents** | `get_incidents` | When searching for incidents by time range, status, severity, or assignee | List of incidents with timing, assignee, status, severity, tags, and ticket information | +| | `get_incident` | When investigating details of a specific incident | Complete incident details including source asset/test information and execution context | +| | `get_asset_incidents_history` | When reviewing past incident patterns for a specific asset | Historical incident records for an asset with status and timing | +| **Environment Management** | `get_environments` | When starting a session to discover available data environments | Dictionary mapping environment IDs to environment names | diff --git a/docs/cloud/mcp/overview.mdx b/docs/cloud/mcp/overview.mdx new file mode 100644 index 000000000..ab8a270f8 --- /dev/null +++ b/docs/cloud/mcp/overview.mdx @@ -0,0 +1,61 @@ +--- +title: "Elementary MCP Server" +sidebarTitle: "MCP overview" +--- + + + + + + + + + + +The Elementary MCP Server allows you to connect your Elementary environment to any client that supports MCP (Model Context Protocol). This lets AI agents, copilots, or natural language interfaces query and act on your data stack using live context. + +## What is MCP? + +MCP (Model Context Protocol) is an open protocol introduced by Anthropic. It defines a way for AI systems to retrieve context and trigger actions from external tools in a structured way. + +The Elementary MCP Server exposes an interface to query key parts of your data environment — including: + +- Models +- Tests +- Incidents +- Test coverage +- Lineage (dbt + BI), including column-level +- Incidents + +This enables workflows like: + +- Asking “What’s the status of the model feeding the revenue dashboard?” +- Automatically creating a freshness test for a column +- Browsing lineage to find upstream causes of issues +- Triggering updates or syncs without opening a UI + +## How it works + +The MCP Server runs as a remote service and is exposed via a single authenticated endpoint. It is compatible with any MCP-enabled client, such as Claude, Cursor IDE and Custom agents and LLM copilots. + +## Supported operations + +You can use the Elementary MCP Server for: + +- **Full asset details** - Metadata, column definitions, and test coverage +- **Lineage** - Explore table + column-level lineage across dbt and BI tools +- **Incidents -** View open incidents and their context +- **Tests -** Browse and add tests using the test catalog +- **Models -** Inspect model metadata and status +- **Execution History -** View historical runs and performance details + +## Coming soon + +We're expanding support to include: + +- Data health summaries +- Volume and freshness metrics +- Sync triggers +- Cloud test config updates + + diff --git a/docs/cloud/mcp/recommended-rules.mdx b/docs/cloud/mcp/recommended-rules.mdx new file mode 100644 index 000000000..d95ed63c3 --- /dev/null +++ b/docs/cloud/mcp/recommended-rules.mdx @@ -0,0 +1,150 @@ +--- +title: "Recommended Rules" +sidebarTitle: "Recommended rules" +--- + + + +asset-discovery-and-metadata.mdc + +```mdc +--- +description: "Use Elementary MCP for authoritative asset and metadata discovery." +alwaysApply: true +--- + +## When to prefer MCP +- You need to find or summarize assets (models, tables, columns) with owners, tags, subscribers, freshness, and incident history. +- You're answering "What is this asset?", "Who owns it?", or "What depends on it?" +- You want up-to-date, production metadata rather than local project assumptions. + +## Guidance +- Query Elementary MCP for full metadata and lineage pointers instead of scanning project files. +- Return a concise summary: asset name, type, owners, freshness, criticality, recent incidents, and related tests. +- If nothing matches, say so clearly and suggest refining filters (name, owner, tag, time window). + +### Examples +**Good** → "Fetching authoritative metadata for `customer_activity_daily` via observability service; summarizing owners, freshness, and downstream consumers." +**Avoid** → "Parsing local SQL files or repo comments to guess owners or status." +``` + + + + + +elementary-best-practices.mdc + +```mdc +--- +description: "General best practices when interacting with Elementary MCP." +alwaysApply: true +--- + +## Guidance +- Work in a context of a specific environment. If not provided in advance, ask the user for the environment they want to work in. +- Apply filters (name, tag, owner, time range) to narrow scope. +- Before applying any detailed filters, begin by retrieving a count of assets. This provides a sense of the dataset size, allowing you to iteratively apply filters (such as by name, tag, owner, or time window) to efficiently narrow your results. +- Use lightweight mode or limit fields for large result sets. +- Handle missing assets gracefully - return a clear message. + +### Examples +**Good** → "Query MCP with owner='Data Platform', time window=7d, lightweight=true." +**Avoid** → "Fetching entire environment data without filters." +``` + + + + + +impact-analysis-and-lineage.mdc + +```mdc +--- +description: "Leverage Elementary MCP to assess upstream and downstream impact — including cross-project dependencies, and to analyze column-level lineage for safe, comprehensive change management." +alwaysApply: true +--- + +## When to prefer MCP +- Before modifying, renaming, or removing a model/table/column. +- When asked "What breaks if we change X?" or "Where does this column come from?" +- When downstream consumers may exist **outside this repo** (e.g., dashboards, jobs in other projects). + +## Guidance +- Use MCP to map: + - **Downstream consumers** (including dashboards/assets external to this project). + - **Upstream sources** and transformations. + - **Column-level lineage** to explain how columns are populated and which downstream columns depend on them. +- Provide a **safe change plan**: sequence updates, notify owners, define rollback steps. + +### Examples +**Good** → "End-to-end lineage: `finance_orders.net_revenue` feeds an external revenue dashboard. coordinate before renaming." +**Avoid** → "Grepping this repo and assuming there are no external consumers." +``` + + + + + +incidents-and-data-health.mdc + +```mdc +--- +description: "Use Elementary MCP to investigate incidents, data-quality issues and assess asset health." +alwaysApply: true +--- + +## When to prefer MCP +- An incident or test failure is mentioned +- You need to check the current health score for an asset. +- You need **history** (when it started, status changes, recurrence) and **related context** (impacted assets/tests). +- You're doing daily or post-deploy health checks. + +## Guidance +- Use MCP to: + - List active or recent incidents (filter by severity, status, or time window). + - Retrieve details: affected downstream assets, related tests/monitors, incident timeline and status. +- Prefer MCP summaries over reading raw logs/stdout directly. +- If none found, state that and suggest adjusting filters/time window. + +### Examples +**Good** → "Using MCP: show all incidents in the last 24h with affected assets, related tests, and current resolution status." +**Avoid** → "Inspecting raw process logs to infer which asset failed." +``` + + + + + +pre-deploy-impact-guardrails.mdc + +```mdc +--- +description: "Before merge/deploy, use Elementary MCP to detect downstream breakage—including external dashboards." +alwaysApply: true +--- + +## When to prefer MCP +- During review or deployment to ensure changes won't break consumers. +- When verifying dashboards, jobs, or models that rely on changed assets. +- When downstream assets lie **outside this repo**. + +## Guidance +- Run a **downstream impact check** via MCP: + - Identify all dependent assets (dashboards, other models). + - Highlight high-risk paths where changes alter downstream **columns** or semantics. +- Use **column-level lineage** to describe how each change propagates. +- If risk exists, recommend: coordination with owners, temporary compatibility (dual-writing/shim columns), or deferring merge until dependents update. + +### Examples +**Good** → "Pre-deploy: MCP shows `orders_enriched.ltv` feeds 'Revenue Overview'—notify owners and add a temporary alias before merge." +**Avoid** → "Assuming safety because tests here pass; skipping checks for external dashboards." +``` + + + +## Paste into cursor to apply rules! + +```placeholder +Add the rules from here to the project rules +https://docs.elementary-data.com/cloud/mcp/recommended-rules +``` diff --git a/docs/cloud/mcp/setup-guide.mdx b/docs/cloud/mcp/setup-guide.mdx new file mode 100644 index 000000000..452bb5543 --- /dev/null +++ b/docs/cloud/mcp/setup-guide.mdx @@ -0,0 +1,94 @@ +--- +title: "MCP Setup Guide" +sidebarTitle: "MCP setup guide" +--- + + +This document walks you through connecting any MCP‑compatible client (Cursor, Claude Desktop, and others) to **Elementary’s production remote MCP server**. +It covers prerequisites, installation of the `mcp-remote` helper, authentication, and per‑client configuration steps. + +## Prerequisites +| Requirement | Why it’s needed | +|-------------|-----------------| +| **Node.js 18 LTS or newer** | `mcp-remote` (and many editors) run on Node. Verify wit `node --version`. | +| **npm / npx** | `npx` fetches `mcp-remote` on demand. Bundled with Node. | + +> **Recommended:** If you don't have Node.js installed, we recommend using `nvm` (Node Version Manager) for easy installation and management: +> ```bash +> # Install nvm (if not already installed) +> curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.0/install.sh | bash +> # Restart your terminal or run: source ~/.bashrc +> +> # Install and use the latest LTS version +> nvm install --lts +> nvm use --lts +> ``` + +--- + +## 1 – Generate an access token +You can now generate tokens directly from the Elementary UI: go to [User → Personal Tokens](https://app.elementary-data.com/settings/user-tokens) or [Account → Account Tokens](https://app.elementary-data.com/settings/account-tokens). + +Quick steps: +1. Open the **User → Personal Tokens** or **Account → Account Tokens** page and click **Generate token**. +2. (Optional) Add a name/description. +3. Copy the token and store it securely — it is shown once. Manage (revoke/rotate) anytime from the same page. + +### Security +User tokens are **user‑scoped bearer tokens** and inherit your workspace permissions. +Account tokens are **account‑scoped bearer tokens** and have "Can View" permissions. +Treat them like passwords — do not share or commit them. Keep them secret, rotate regularly, and revoke immediately if compromised. + + +--- + +## 2 – Install `mcp-remote` (optional) +You don't have to install anything globally, our configs use npx to fetch the latest version automatically. If you do prefer a global install, use: +```bash +# Optional: global install +npm install -g mcp-remote@latest +``` + +> **Important:** `mcp-remote` expects the **server URL as the first argument** (flags come **after** the URL). It does **not** implement `--help`; running `mcp-remote --help` will error because `--help` is treated as a URL. + +--- + +## 3 – Configure your client +Most MCP‑compatible clients read a JSON config that defines mcpServers. Use one of the following patterns. +```jsonc +{ + "mcpServers": { + "elementary-remote": { + "command": "npx", + "args": [ + "-y", + "mcp-remote@latest", + "https://prod.api.elementary-data.com/mcp/", + "--header", + "Authorization:${AUTH_HEADER}" + ], + "env": { + "AUTH_HEADER": "Bearer " + } + } + } +} +``` +**Why the `-y` flag?** +It makes `npx` skip the interactive “install this package?” prompt. + +### Client‑specific steps + +#### 3.1 Cursor IDE +1. **Settings → Model Context Protocol → Add Custom Server** + or edit `~/.cursor/mcp.json` (global) / `/.cursor/mcp.json` (workspace). +2. Paste the JSON above. +3. Save – Cursor auto‑restarts its MCP agent. + +#### 3.2 Claude Desktop +1. In the menu bar choose **Claude → Settings -> Developer**. +2. Under **Local MCP Servers**, click **Edit**. +3. Paste the JSON block +4. Restart Claude + + diff --git a/docs/cloud/onboarding/connect-data-warehouse.mdx b/docs/cloud/onboarding/connect-data-warehouse.mdx index b18cebf91..44198f8c0 100644 --- a/docs/cloud/onboarding/connect-data-warehouse.mdx +++ b/docs/cloud/onboarding/connect-data-warehouse.mdx @@ -1,8 +1,12 @@ --- title: "Connect data warehouse" -icon: "square-3" +icon: "square-2" --- +import ConnectDwhCards from '/snippets/cloud/integrations/cards-groups/connect-dwh-cards.mdx'; + + + ### Create your first environment When you first login to Elementary, it will not have any data. @@ -25,7 +29,7 @@ Elementary Cloud needs: Which data warehouse do you wish to connect? - + ### Allowlist Elementary IP diff --git a/docs/cloud/onboarding/quickstart-dbt-package.mdx b/docs/cloud/onboarding/quickstart-dbt-package.mdx index fe0c35188..c4b6496e6 100644 --- a/docs/cloud/onboarding/quickstart-dbt-package.mdx +++ b/docs/cloud/onboarding/quickstart-dbt-package.mdx @@ -1,12 +1,17 @@ --- title: "Install Elementary dbt package" sidebarTitle: "Install dbt package" -icon: "square-2" +icon: "square-1" --- - +import QuickstartPackageInstall from '/snippets/quickstart-package-install.mdx'; + + + + ## What's next? - [Connect your data warehouse](/cloud/onboarding/connect-data-warehouse) +- Connect multi dbt projects to your environment. [Learn more here](/cloud/features/multi-env) - [invite team members](/cloud/manage-team) to join! diff --git a/docs/cloud/python-sdk/overview.mdx b/docs/cloud/python-sdk/overview.mdx new file mode 100644 index 000000000..b13293ce4 --- /dev/null +++ b/docs/cloud/python-sdk/overview.mdx @@ -0,0 +1,82 @@ +--- +title: "Python SDK Overview" +sidebarTitle: "Python SDK" +icon: "code" +--- + + +The Python SDK is currently in beta. If you want early access or want to see how it fits your implementation, [reach out to the team](https://meetings-eu1.hubspot.com/joost-boonzajer-flaes/intro-call-docs?uuid=17a4a61f-d0d3-4cbc-9362-56e37483f6f5). + + +More teams are shifting their data quality checks out of dashboards and into the transformation layer itself. It's obvious why: the transformation code is the first place that touches real data. If a check fails here, the pipeline stops before corrupted rows ever land downstream. No backfills, no detective work, no "how long has this been wrong?" scramble. + +And the value cuts both ways: + +- **You catch issues before they ever hit the data warehouse or lake**, right at the ingestion and preprocessing layers. +- **You catch issues after the data warehouse too**, in the pipelines that stream data to downstream destinations, models, APIs, and operational systems. + +## Python: The Backbone of Modern Data Engineering + +Python has become the backbone of modern data engineering - especially in pipelines that go beyond SQL. It now drives: + +- Ingestion and storage of **unstructured data** +- **Vectorization and embedding pipelines** for AI systems +- **ML model training** and feature generation +- Monitoring of **model inputs and outputs** +- Hybrid pipelines that mix structured, semi-structured, and free-form data + +As these pipelines multiply, Python becomes the glue. It runs wherever data flows — before the DWH, inside the DWH, and after the DWH — making it the natural place for data quality and observability to live. + +## Wrapping Existing Tools Instead of Inventing New Ones + +Engineers already have strong opinions about how they want to write tests. Some rely on Great Expectations, others on DQX, pytest-based workflows, or homegrown frameworks. Reinventing a new test engine or DSL would just fragment the landscape - so we didn't. + +We focused on the simplest possible layer: **a lightweight Python SDK that captures any Python test result, from any framework**, and reports it to Elementary. You keep your code - we handle the metadata, structure, and visibility. + +This means full observability without dictating how you build. + +## Built for Teams That Treat Their Data Pipelines Like Software + +Elementary has always leaned into engineering-first workflows. Our deep integration with dbt set that foundation. Extending this into Python is the natural continuation of that approach. + +As more transformations shift into Python (Pyspark, SQL generation, AI/ML pipelines, unstructured data processing), teams want the same capabilities they rely on when using Elementary with dbt: + +- Understand what ran +- Track when it ran +- Measure how long it took +- Identify which upstream assets fed it +- Trace which downstream assets it produced +- Run data quality checks on the product and see the results +- Get alerts on data issues as soon as they happen + +The SDK provides exactly that by wrapping the transformation code itself. You get execution metadata, lineage, run context, and full test surface — directly from inside your existing codebase. + +This unifies **Analytics Engineering, Data Science, and AI/ML Operations** into a single observability platform. Python + dbt + cloud tests now all land in one place. + +## What You'll See in Elementary Once You Report Through the SDK + +When a Python pipeline reports assets, test results, and execution metadata, everything shows up in Elementary unified with your dbt and cloud tests: + +- **All test results appear together** — Python validations, dbt tests, cloud tests — in a single, consistent interface. +- **Alerts fire through your existing channels** (Slack, PagerDuty, email), ensuring that pipeline-level issues trigger the same operational flow as warehouse-level ones. +- **Incidents are created automatically** for detected issues, including opening Jira tickets. Elementary's agentic tools then investigate root cause, assess downstream impact, and guide resolution. +- **Lineage becomes fully connected**, tying together Python assets, dbt models, warehouse tables, unstructured data, vectors, and ML outputs. +- **Every table, view, file, or vector store entity produced by Python becomes discoverable** through the Elementary catalog, data discovery agent, and MCP server — giving analysts, DS, and AI teams a shared understanding of the entire data ecosystem. + +This closes the gap between ingestion pipelines, warehouse transformations, ML prep code, and AI workloads — all observed in one place. + +## Next Steps + + + + Learn how to install and configure the Python SDK + + + See how to report assets and test results from your Python pipelines + + + +## Get Started + +The SDK is now in beta and already surfacing surprisingly rich insights from Python-based workflows. If you want early access or want to see how it fits your implementation, [reach out to the team](https://meetings-eu1.hubspot.com/joost-boonzajer-flaes/intro-call-docs?uuid=17a4a61f-d0d3-4cbc-9362-56e37483f6f5) — we're shaping this with real teams using it at scale. + diff --git a/docs/cloud/python-sdk/setup-guide.mdx b/docs/cloud/python-sdk/setup-guide.mdx new file mode 100644 index 000000000..67109aa41 --- /dev/null +++ b/docs/cloud/python-sdk/setup-guide.mdx @@ -0,0 +1,65 @@ +--- +title: "Python SDK Setup Guide" +sidebarTitle: "Setup Guide" +--- + +This guide walks you through installing and configuring the Elementary Python SDK to report assets, test results, and execution metadata from your Python pipelines. + +## Installation + +The Python SDK is installed via pip. Installation instructions and package requirements will be provided when you get access to the beta. + +## Configuration + +The Python SDK requires configuration to connect to your Elementary Cloud environment. You'll need: + +1. **Elementary Cloud API Key** - Get this from your Elementary Cloud account settings +2. **Environment ID** - Your Elementary environment identifier + +Configuration can be done via environment variables or a configuration file, similar to how the Elementary CLI is configured. + +## Basic Concepts + +The SDK is designed to work with any Python testing framework. The core functionality includes: + +- **Reporting assets** - Tables, files, vector stores, or any data entity produced by your Python pipeline +- **Reporting test results** - Results from any testing framework (Great Expectations, pytest, custom frameworks, etc.) +- **Tracking execution metadata** - Pipeline runs, timing, status, and errors +- **Reporting lineage** - Connecting Python assets to upstream and downstream dependencies + +## Integration with Existing Frameworks + +The SDK works with any Python testing framework. You can wrap your existing tests to report results to Elementary without changing your test logic: + +- **Great Expectations** - Report GE validation results +- **Pytest** - Report pytest test outcomes +- **Custom frameworks** - Report results from any homegrown testing solution +- **DQX and other tools** - The SDK is framework-agnostic + +## Reporting Execution Metadata + +Track pipeline execution details including: +- Pipeline name and environment +- Start and end times +- Duration +- Success/failure status +- Error messages and stack traces + +## Reporting Lineage + +Connect your Python assets to upstream and downstream dependencies, creating a complete lineage graph that includes: +- Python pipeline outputs +- dbt models +- Warehouse tables +- Unstructured data sources +- Vector stores +- ML model outputs + +## Next Steps + +- See [Usage Examples](/cloud/python-sdk/usage-examples) for conceptual examples +- Learn about the [SDK Overview](/cloud/python-sdk/overview) + +## Need Help? + +If you need assistance with setup or have questions about the SDK, [reach out to the team](https://meetings-eu1.hubspot.com/joost-boonzajer-flaes/intro-call-docs?uuid=17a4a61f-d0d3-4cbc-9362-56e37483f6f5). diff --git a/docs/cloud/python-sdk/usage-examples.mdx b/docs/cloud/python-sdk/usage-examples.mdx new file mode 100644 index 000000000..d7f7b6d5e --- /dev/null +++ b/docs/cloud/python-sdk/usage-examples.mdx @@ -0,0 +1,84 @@ +--- +title: "Python SDK Usage Examples" +sidebarTitle: "Usage Examples" +--- + +This page provides conceptual examples of how the Elementary Python SDK can be used in different scenarios. + +## Reporting Assets + +### Tables and Views + +Report tables or views created by your Python pipeline. Include metadata like schema, database, and description to make them discoverable in the Elementary catalog. + +### Files and Unstructured Data + +Report files, blobs, or unstructured data stored in object storage (S3, GCS, Azure Blob, etc.). Include location, format, and other relevant metadata. + +### Vector Stores + +Report vector stores used in AI/ML pipelines. Include information about the store type (Pinecone, Weaviate, etc.), index names, and dimensions. + +## Reporting Test Results + +### Basic Test Results + +Report simple test outcomes - whether a test passed or failed, along with the test name and type. + +### Detailed Test Results + +Report comprehensive test information including: +- Test name and type +- Pass/fail status +- Actual vs expected values +- Column-level details +- Failed row counts +- Sample data from failed rows + +### Framework Integration + +Report test results from any framework: +- Wrap Great Expectations validations +- Report pytest outcomes +- Capture results from custom test frameworks +- Integrate with DQX or other data quality tools + +## Complete Pipeline Example + +A typical Python pipeline using the SDK would: + +1. **Start tracking** - Begin a pipeline run with metadata (name, environment) +2. **Report input assets** - Document what data sources the pipeline consumes +3. **Execute transformations** - Run your existing Python code +4. **Report output assets** - Document what the pipeline produces +5. **Run and report tests** - Execute data quality checks and report results +6. **End tracking** - Complete the run with success/failure status and timing + +This creates a complete observability record in Elementary, unified with your dbt and cloud tests. + +## Integration with Orchestrators + +The SDK can be integrated with any orchestrator: + +- **Airflow** - Wrap your Python tasks to report execution and test results +- **Prefect** - Use the SDK in Prefect flows and tasks +- **Dagster** - Report assets and tests from Dagster ops +- **Custom orchestrators** - Works with any Python-based orchestration system + +## ML Pipeline Example + +For ML pipelines, you can: + +- Report training data assets +- Report model artifacts +- Report test/validation datasets +- Report model performance metrics as test results +- Track model training runs +- Connect models to their training data and downstream consumers + +This provides full observability for ML workflows alongside your data engineering pipelines. + +## Next Steps + +- Review the [Setup Guide](/cloud/python-sdk/setup-guide) for installation and configuration +- Learn about the [SDK Overview](/cloud/python-sdk/overview) diff --git a/docs/cloud/quickstart.mdx b/docs/cloud/quickstart.mdx new file mode 100644 index 000000000..c1fb8dc9e --- /dev/null +++ b/docs/cloud/quickstart.mdx @@ -0,0 +1,33 @@ +--- +title: "Quickstart the Elementary Cloud Platform" +sidebarTitle: "Quickstart" +icon: "circle-play" +--- + +import QuickstartSteps from '/snippets/cloud/quickstart-steps.mdx'; + +Welcome to Elementary! + +You’re moments away from source-to-BI observability that helps you spot issues early, trust your data, and enforce governance effortlessly. +AI agents will automate coverage, triage, and documentation so you can focus on higher-impact work. + +There are two ways to get started, depending on what works best for your team: + +- **Start a 30-day free trial** and set everything up yourself — perfect for smaller teams or hands-on users. +- **For larger teams or more tailored needs, [reach out to the team](https://meetings-eu1.hubspot.com/joost-boonzajer-flaes/intro-call-docs?uuid=17a4a61f-d0d3-4cbc-9362-56e37483f6f5)** and we'll be happy to help you get started. + +To use Elementary, you must be a dbt user. We will soon release a Python SDK for any Python pipeline and test. + +### Self-Serve Setup (Free Trial) + + + +### Get the most out of Elementary +To explore everything the platform has to offer: +- [Take our product tour](https://www.elementary-data.com/product-tour-signup) for a guided walkthrough of key capabilities +- Browse the [Elementary Cloud feature overview](/cloud/features) + + +### Need help with onboarding? + +We can provide [support on Slack](https://elementary-data.com/community) or hop on a [guided onboarding call](https://meetings-eu1.hubspot.com/joost-boonzajer-flaes/intro-call-docs?uuid=17a4a61f-d0d3-4cbc-9362-56e37483f6f5)! diff --git a/docs/cloud/resources/ai-agents.mdx b/docs/cloud/resources/ai-agents.mdx new file mode 100644 index 000000000..73af47bb1 --- /dev/null +++ b/docs/cloud/resources/ai-agents.mdx @@ -0,0 +1,64 @@ +--- +title: "Meet our new hires." +mode: "wide" + +--- + + +### 🔥 **Welcome Blaze Fixer – Incident Triage & Resolution** + +> Let’s give a warm welcome (pun intended) to Blaze Fixer (pronouns: fix/fixes). +> +> +>**Career highlight**: Resolved a critical pipeline outage 6 minutes before it happened. +> +>**Specialty**: Incident triage, root cause tracing, and putting out fires before they hit Slack. +> +>**Outside of work**: Volunteers as a chaos engineer in simulated war rooms. +> +> Welcome, Blaze. We feel safer already. +> + +### ✅ **Welcome Val E. Dator – Test Coverage & Recommendations** + +> Say hello to Val E. Dator, joining as our AI-powered test strategist (pronouns: test/tests). +> +> +>**Career highlight**: Once added 142 missing tests during a lunch break. +> +>**Specialty**: Auto-generating test coverage plans, recommending the right tests for every edge case, and proving you forgot something—politely. +> +>**Outside of work**: Writes dbt tests for her Netflix queue. Flags genre drift, inconsistent ratings, and suspicious rewatch patterns. +> +> We’re thrilled to have you, Val. Our models are already sleeping better at night. +> + +### 🗂 **Welcome Cat A. Logue – Metadata Governance & Discovery** + +> Please welcome Cat A. Logue to the team (pronouns: tag/tags) +> +> +>**Career highlight**: Tagged and documented 18,492 data assets across 7 environments before her morning coffee. +> +>**Specialty**: Organizing chaos. Tags, owners, descriptions, policies—if it exists, she governs it. +> +>**Outside of work**: Adds metadata tags to everything she owns. The toaster is now “device_type: appliance, heat_source: electric, owner: Cat.” +> +> Welcome aboard, Cat. Our catalog already looks cleaner. +> + + +### 🪓 **Welcome Bill Cutter - Performance & Cost** + +> Please welcome Bill Cutter to the team! (pronouns: cost/cuts) +> +> +> **Career highlight**: Saved $73K/month by killing off nightly model runs that hadn’t been queried in 6 months. +> +> **Specialty**: Finds the jobs no one remembers, the models no one needs, and the compute they’re silently burning. Then cuts them. +> +> **Outside of work**: Writes a performance blog called *Drop Table Everything*. Posts are short. Just like your idle query lifespan. +> +> Welcome, Bill. We’re already feeling...lighter. +> +> diff --git a/docs/cloud/resources/business-case-data-observability-platform.mdx b/docs/cloud/resources/business-case-data-observability-platform.mdx new file mode 100644 index 000000000..a78f11580 --- /dev/null +++ b/docs/cloud/resources/business-case-data-observability-platform.mdx @@ -0,0 +1,25 @@ +--- +title: "When do I need a data observability platform?" +sidebarTitle: "When to add data observability" +--- + + +### If the consequences of data issues are high +If you are running performance marketing budgets of $millions, a data issue can result in a loss of hundreds of thousands of dollars. +In these cases, the ability to detect and resolve issues fast is business-critical. It typically involves multiple teams and the ability to measure, track, and report on data quality. + +### If data is scaling faster than the data team +The scale and complexity of modern data environments make it impossible for teams to manually manage quality without expanding the team. A data observability platform enables automation and collaboration, ensuring data quality is maintained as data continues to grow, without impacting team efficiency. + +### Common use cases +If your data is being used in one of the following use cases, you should consider adding a data observability platform: +- Self-service analytics +- Data activation +- Powering AI & ML products +- Embedded analytics +- Performance marketing +- Regulatory reporting +- A/B testing and experiments + +## Why isn't the open-source package enough? +The open-source package was designed for engineers that want to monitor their dbt project. The Cloud Platform was designed to support the complex, multifaceted requirements of larger teams and organizations, providing a holistic observability solution. \ No newline at end of file diff --git a/docs/cloud/resources/community.mdx b/docs/cloud/resources/community.mdx new file mode 100644 index 000000000..fc79fac9a --- /dev/null +++ b/docs/cloud/resources/community.mdx @@ -0,0 +1,9 @@ +--- +title: "Community" +url: "https://www.elementary-data.com/community" +--- + + diff --git a/docs/cloud/resources/how-does-elementary-work.mdx b/docs/cloud/resources/how-does-elementary-work.mdx new file mode 100644 index 000000000..d1586d3b4 --- /dev/null +++ b/docs/cloud/resources/how-does-elementary-work.mdx @@ -0,0 +1,28 @@ +--- +title: "How does Elementary work" +sidebarTitle: "Elementary Cloud Platform" +--- +## Cloud platform architecture +The Elementary open-source package creates a schema that collects the test results and the models from your dbt projects. The platform is part of your package and it runs in your dbt pipeline and it writes to its own data set in the data warehouse and then the platform syncs that data set to the cloud. It also integrates directly with your data warehouse so it has access to the information schema, the query history and the metadata. + +We also integrate with your dbt code repository - so we understand how it’s built including tags, owners, which tables are part of your dbt project and what tables are not, and we see daily usage by connecting to your BI. + + + Elementary Cloud Platform Architecture + + + +## How it works? +1. You install the Elementary dbt package in your dbt project and configure it to write to it's own schema, the Elementary schema. +2. The package writes test results, run results, logs and metadata to the Elementary schema. +3. The cloud service only requires `read access` to the Elementary schema, not to schemas where your sensitive data is stored. +4. The cloud service connects to sync the Elementary schema using an **encrypted connection** and a **static IP address** that you will need to add to your allowlist. + + +## + + +[Read about Security and Privacy](/cloud/general/security-and-privacy) diff --git a/docs/cloud/resources/pricing.mdx b/docs/cloud/resources/pricing.mdx new file mode 100644 index 000000000..197ecc538 --- /dev/null +++ b/docs/cloud/resources/pricing.mdx @@ -0,0 +1,9 @@ +--- +title: "Pricing" +url: "https://www.elementary-data.com/pricing" +--- + + \ No newline at end of file diff --git a/docs/cloud/what-is-elementary.mdx b/docs/cloud/what-is-elementary.mdx new file mode 100644 index 000000000..0ca442e88 --- /dev/null +++ b/docs/cloud/what-is-elementary.mdx @@ -0,0 +1,80 @@ +--- +title: "What is Elementary?" +mode: "wide" + +--- +Elementary transforms data reliability from an engineering focus into a shared foundation of trust for the entire organization. As analytics and AI pipelines grow in complexity, teams wrestle with test maintenance, incident triage, and data governance—diverting time and attention from innovation. + +Elementary bridges the gap by offering: + +- **Developer-first workflows** that integrate seamlessly with dbt, Git, and CI/CD +- **Business-ready visibility** that makes data health accessible to analysts and leaders +- **AI-powered reliability**, turning observability into proactive, automated action. + +## How it works + + + +Elementary offers a modular observability stack built for dbt-based data workflows. At the foundation is an open-source dbt package that captures key metadata and test results from your dbt project. This package powers both the self-hosted CLI and the fully managed Cloud Platform, which is built on top of it to provide wider visibility, automations, scale and advanced reports. + +## Products + +Whether you're starting small or scaling observability across your organization, you can choose the setup that fits your team—while unlocking more value as your needs evolve. + +### [**Elementary dbt package (OSS)**](/data-tests/dbt/dbt-package) + +The Elementary dbt package is open source and runs in your existing dbt workflows. It collects metadata, test results, and lineage from your dbt runs, and also enables Elementary’s anomaly detection tests. + +Use it to: +- Save dbt artifacts, test and run results in your data warehouse +- Run pre-built anomaly detection and schema changes tests + +### [Elementary CLI (OSS)](/oss/oss-introduction) + +### + +Elementary CLI help teams adopt foundational data observability within dbt. It centralizes metadata and test results to a self-hosted report, and provides a starting point for alerts and monitoring. It’s a great way to introduce reliability into engineering workflows. + +Use it to: + +- Present test results and metadata from your dbt project in a visual report +- Send basic alerts over test and schema failures + +Setup Elementary OSS [here](https://docs.elementary-data.com/oss/quickstart/quickstart-cli-package). + +### [**Elementary Cloud Platform**](/cloud/introduction) + +Elementary Cloud builds on the dbt package by adding a fully managed, AI-powered platform for organization-wide reliability. It includes smart alerting, incident management, column-level lineage, anomaly detection, AI agents, and enterprise features like RBAC and audit logs. It’s designed for teams that want to scale trust, automation, and collaboration across engineering and business. + +Use it to: + +- Integrate with your entire stack - from DWH to BI, AI tools and ticketing systems +- Give engineers, analysts, and stakeholders shared visibility into data health +- Scale reliability efforts across large teams and complex environments +- Reduce alert fatigue - group and route alerts based on criticality, ownership, and impact +- Automate test recommendations, metadata coverage, and root cause analysis with AI agents +- Track progress over time, enforce governance, and support AI readiness + +Learn more about Elementary Cloud’s features and integrations [here](https://docs.elementary-data.com/cloud/introduction), and get started with a 30-day free trial [here](https://docs.elementary-data.com/quickstart). + +To further understand the differences, check out the full [**OSS vs. Cloud comparison**](/cloud/cloud-vs-oss). + +## **Security and Privacy** + +Elementary is designed with security and privacy in mind. + +- Elementary Cloud does not have read access to raw data in your data warehouse. +- Elementary Cloud only extracts and stores metadata, logs and aggregated metrics. +- All data is encrypted at rest and in transit using industry standard protocols. +- Elementary uses service accounts or authentication tokens with granular and minimal permissions. +- Elementary offers deployment options with no direct access from Elementary Cloud to your data warehouse and third-party tools. +- Advanced authentication such as MFA, Okta SSO, Microsoft AD are available upon request. + +**SOC 2 certification:** Elementary Cloud is SOC2 type II certified! + +Learn more about how Elementary ensures security and privacy [here](https://docs.elementary-data.com/cloud/general/security-and-privacy). + +**Want to know more?** + +- [Book a call with our team](https://meetings-eu1.hubspot.com/joost-boonzajer-flaes/intro-call-docs) +- [Join the community and reach out on Slack](https://elementary-community.slack.com/ssb/redirect) diff --git a/docs/culture.mdx b/docs/culture.mdx new file mode 100644 index 000000000..34902b2bb --- /dev/null +++ b/docs/culture.mdx @@ -0,0 +1,80 @@ +--- +title: Culture +description: Last updated July 21, 2025 +--- +These are the standards we adhere to as a team. + +The process of building Elementary is similar for both product and company. Each one of us is responsible for implementing these principles daily, as well as demanding and helping others to do so. + + +### Extreme ownership + +Building Elementary is a team effort. However, each team member can have a major impact on our success - we measure your personal performance accordingly. In a sense, we expect each of you to think of yourself as if you are solely responsible for achieving our goals. + +This requires you to understand what the goals are, recognize our biggest blockers and challenges, and think of the most efficient ways to make progress. The best you can do is focus your time and energy only on tasks that move the needle. + +You have full ownership of your work, so pick the most impactful tasks and communicate to the team how they can help you succeed. As long as you do that, we will never limit you. + +We measure you according to the impact you make on every aspect of our mission - building the team, the business, our product and our community. This also means that we expect you not to limit yourself to your job title. If it is impactful and whitin your ability - do it. If you see it as outside your scope, you are probably not a good fit for our culture. + +We aim high. We expect our people to step up. + +### Transparency + +We are committed to being as transparent as possible. + +We expect you to focus on what makes the most impact. The best way to achieve that is to provide you the full context. You have to understand the big picture, and get access to all the details. We believe that this is the best way to empower people to make right decisions. + +This is also the best way to create alignment within the team. To work well together, we can’t afford information gaps. This means that everything is public and shared by default. The only exceptions are personal private issues and matters that can put the company at risk. + +We talk openly and honestly about our challenges, goals, decision making, board meetings, fundraising and financials. + +We also believe that being open and honest enables us to build strong relationships with the community, our customers and our investors. + +### Communication and feedback + +Open and honest communication is essential for our success as a team. Open and honest feedback is essential for the success of each team member. + +As a company founded by Israeli founders, we call this talking “[dugri](https://www.thejc.com/judaism/jewish-words/dugri-1.13684)”. We communicate in a direct, straightforward and informal manner. We also expect unapologetic directness in feedback we get, and hence in feedback we give. + +This kind of communication only works if we trust each other. We have common goals, and we want our team members to perform at their best. This is why we give “dugri” feedback, to help each other succeed. When you get feedback, always assume best intentions. + +### Constantly adapting + +You can think of startups as a learning contest. You need to learn fast, change accordingly, and learn the next thing (even faster). What was right and worked for us last month might not work in the next one. What was right a year ago will definitely not work now. + +We are constantly learning and adapting, we must be open-minded. People are naturally intimidated by changes, but in a startup - we must fear stagnation and embrace change. It sounds scary at first, but this is actually what makes us confident. As long as we respond fast and adapt - we are on the right track. + +### Community driven + +When we started Elementary we assumed we could built it better if we build a community around it. Today we are convinced that we won’t be able to succeed without it. Every data team that tries Elementary contributes to our efforts, and the learning and progress we make is thanks to the feedback and involvement of these teams. + +We are committed to this community. We will continue to invest in the open source project alongside commercial offerings, we provide the best support and user experience we can to all our users, and we value and show gratitude to our contributors. + +### People above all + +We have high demands, and we expect you to be highly committed to our mission. However, your top commitment should be your well being and family. This is not only the right thing to do, it’s the only way to build a healthy and thriving team. + +If you don’t feel at your best, physically or emotionally, focus on healing. There is no point to work when you can’t be on the top of your game anyway. + +There is always more work to do, so don’t wait for a good time to take a vacation. Just take it. We will manage, and we know you will come back charged with energy and creativity. + +It is essential to our success that Elementary will be a work place that people enjoy being part of. We have zero tolerance to team members that don’t respect others. + +## Code of conduct + +At Elementary, we’re committed to fostering a respectful and inclusive environment where everyone can thrive. To support this, we expect all team members, partners, and contractors to follow these principles: +#### Respect and Inclusion +- Treat everyone with dignity, fairness, and respect. +- Zero tolerance for harassment, discrimination, bullying, or offensive behavior. +- Value diversity and create a safe space for open communication. +#### Integrity and Ethics +- Be honest and transparent in all your work and communications. +- Avoid conflicts of interest and disclose them if they arise. +- Protect confidential information of Elementary, our customers, and partners. +#### Legal and Professional Standards +- Follow all applicable laws, regulations, and company policies. +- Uphold our commitment to privacy, security, and ethical use of data. +#### Speaking Up + +If you see behavior that violates these principles, report it to your manager or People Ops. We will handle concerns respectfully and without retaliation. \ No newline at end of file diff --git a/docs/data-tests/add-elementary-tests.mdx b/docs/data-tests/add-elementary-tests.mdx index 3c6b4d524..777c40a41 100644 --- a/docs/data-tests/add-elementary-tests.mdx +++ b/docs/data-tests/add-elementary-tests.mdx @@ -2,7 +2,7 @@ title: "Add anomaly detection tests" --- -After you [install the dbt package](/quickstart#install-the-dbt-package), you can add Elementary data anomaly detection tests. +After you [install the dbt package](cloud/quickstart#install-the-dbt-package), you can add Elementary data anomaly detection tests. ## Data anomaly detection dbt tests @@ -86,35 +86,39 @@ models: config: elementary: timestamp_column: < timestamp column > - tests: + data_tests: - elementary.freshness_anomalies: - # optional - configure different freshness column than timestamp column - where_expression: < sql expression > - time_bucket: - period: < time period > - count: < number of periods > + arguments: + # optional - configure different freshness column than timestamp column + where_expression: < sql expression > + time_bucket: + period: < time period > + count: < number of periods > - elementary.all_columns_anomalies: - column_anomalies: < specific monitors, all if null > - where_expression: < sql expression > - time_bucket: - period: < time period > - count: < number of periods > + arguments: + column_anomalies: < specific monitors, all if null > + where_expression: < sql expression > + time_bucket: + period: < time period > + count: < number of periods > - elementary.schema_changes - elementary.dimension_anomalies: - dimensions: < columns or sql expressions of columns > - # optional - configure a where a expression to accurate the dimension monitoring - where_expression: < sql expression > - time_bucket: - period: < time period > - count: < number of periods > + arguments: + dimensions: < columns or sql expressions of columns > + # optional - configure a where a expression to accurate the dimension monitoring + where_expression: < sql expression > + time_bucket: + period: < time period > + count: < number of periods > - name: < model name > ## if no timestamp is configured, elementary will monitor without time filtering columns: - name: < column name > - tests: + data_tests: - elementary.column_anomalies: - column_anomalies: < specific monitors, all if null > + arguments: + column_anomalies: < specific monitors, all if null > ``` ```yml Models example @@ -125,51 +129,59 @@ models: config: elementary: timestamp_column: 'loaded_at' - tests: + data_tests: - elementary.volume_anomalies: - # optional - use tags to run elementary tests on a dedicated run - tags: ['elementary'] config: - # optional - change severity + # optional - use tags to run elementary tests on a dedicated run + tags: ['elementary'] + # optional - change severity severity: warn - elementary.all_columns_anomalies: - tags: ['elementary'] - # optional - change global sensitivity - anomaly_sensitivity: 3.5 - timestamp_column: 'updated_at' + config: + tags: ['elementary'] + arguments: + # optional - change global sensitivity + anomaly_sensitivity: 3.5 + timestamp_column: 'updated_at' - elementary.schema_changes: - tags: ['elementary'] config: + tags: ['elementary'] severity: warn - elementary.dimension_anomalies: - dimensions: - - event_type - - country_name - where_expression: "event_type in ('event_1', 'event_2') and country_name != 'unwanted country'" - # optional - use tags to run elementary tests on a dedicated run - tags: ['elementary'] + arguments: + dimensions: + - event_type + - country_name + where_expression: "event_type in ('event_1', 'event_2') and country_name != 'unwanted country'" config: - # optional - change severity + # optional - use tags to run elementary tests on a dedicated run + tags: ['elementary'] + # optional - change severity severity: warn - name: users ## if no timestamp is configured, elementary will monitor without time filtering - tests: + data_tests: elementary.volume_anomalies - tags: ['elementary'] + config: + tags: ['elementary'] columns: - name: user_id - tests: + data_tests: - elementary.column_anomalies: - tags: ['elementary'] - timestamp_column: 'updated_at' + config: + tags: ['elementary'] + arguments: + timestamp_column: 'updated_at' - name: user_name - tests: + data_tests: - elementary.column_anomalies: - column_anomalies: - - missing_count - - min_length - tags: ['elementary'] + arguments: + column_anomalies: + - missing_count + - min_length + config: + tags: ['elementary'] ``` ```yml Sources @@ -179,11 +191,10 @@ sources: schema: < schema > tables: - name: < table_name > - ## sources don't have config, so elementary config is placed under 'meta' - meta: + config: elementary: timestamp_column: < source timestamp column > - tests: + data_tests: ``` ```yml Sources example @@ -193,25 +204,26 @@ sources: schema: "product" tables: - name: "raw_product_login_events" - ## sources don't have config, so elementary config is placed under 'meta' - meta: + config: elementary: timestamp_column: "loaded_at" - tests: + data_tests: - elementary.freshness_anomalies - elementary.dimension_anomalies: - dimensions: - - event_type + arguments: + dimensions: + - event_type - elementary.all_columns_anomalies: - column_anomalies: - - null_count - - missing_count - - zero_count + arguments: + column_anomalies: + - null_count + - missing_count + - zero_count - elementary.schema_changes_from_baseline columns: - name: user_id data_type: text - tests: + data_tests: - elementary.column_anomalies - name: event_name data_type: text diff --git a/docs/data-tests/ai-data-tests/ai_data_validations.mdx b/docs/data-tests/ai-data-tests/ai_data_validations.mdx new file mode 100644 index 000000000..999a5fa9f --- /dev/null +++ b/docs/data-tests/ai-data-tests/ai_data_validations.mdx @@ -0,0 +1,131 @@ +--- +title: "AI Data Validations" +--- + + + **Beta Feature**: AI data validation tests is currently in beta. The functionality and interface may change in future releases. + + **Version Requirement**: This feature requires Elementary dbt package version 0.18.0 or above. + + +# AI Data Validation with Elementary + +## What is AI Data Validation? + +Elementary's `elementary.ai_data_validation` test allows you to validate any data column using AI and LLM language models. This test is more flexible than traditional tests as it can be applied to any column type and uses natural language to define validation rules. + +With `ai_data_validation`, you can simply describe what you expect from your data in plain English, and Elementary will check if your data meets those expectations. This is particularly useful for complex validation rules that would be difficult to express with traditional SQL or dbt tests. + +## How It Works + +Elementary leverages the AI and LLM capabilities built directly into your data warehouse. When you run a validation test: + +1. Your data stays within your data warehouse +2. The warehouse's built-in AI and LLM functions analyze the data +3. Elementary reports whether each value meets your expectations based on the prompt + +## Required Setup for Each Data Warehouse + +Before you can use Elementary's AI data validations, you need to set up AI and LLM capabilities in your data warehouse: + +### Snowflake +- **Prerequisite**: Enable Snowflake Cortex AI LLM functions +- **Recommended Model**: `claude-3-5-sonnet` +- [View Snowflake's Guide](/data-tests/ai-data-tests/supported-platforms/snowflake) + +### Databricks +- **Prerequisite**: Ensure Databricks AI Functions are available +- **Recommended Model**: `databricks-meta-llama-3-3-70b-instruct` +- [View Databrick's Setup Guide](/data-tests/ai-data-tests/supported-platforms/databricks) + +### BigQuery +- **Prerequisite**: Configure BigQuery to use Vertex AI models +- **Recommended Model**: `gemini-1.5-pro` +- [View BigQuery's Setup Guide](/data-tests/ai-data-tests/supported-platforms/bigquery) + +### Redshift +- Support coming soon + +### Data Lakes +- Currently supported through Snowflake, Databricks, or BigQuery external object tables +- [View Data Lakes Information](/data-tests/ai-data-tests/supported-platforms/data-lakes) + +## Using the AI Data Validation Test + +The test requires one main parameter: +- `expectation_prompt`: Describe what you expect from the data in plain English + +Optionally, you can also specify: +- `llm_model_name`: Specify which AI model to use (see recommendations above for each warehouse) + + + This test works with any column type, as the data will be converted to a string format for validation. This enables natural language data validations for dates, numbers, and other structured data types. + + + + +```yml Models +version: 2 + +models: + - name: < model name > + columns: + - name: < column name > + data_tests: + - elementary.ai_data_validation: + arguments: + expectation_prompt: "Description of what the data should satisfy" + llm_model_name: "model_name" # Optional +``` + +```yml Example - Date Validation +version: 2 + +models: + - name: crm + description: "A table containing contract details." + columns: + - name: contract_date + description: "The date when the contract was signed." + data_tests: + - elementary.ai_data_validation: + arguments: + expectation_prompt: "There should be no contract date in the future" +``` + +```yml Example - Numeric Validation +version: 2 + +models: + - name: sales + description: "A table containing sales data." + columns: + - name: discount_percentage + description: "The discount percentage applied to the sale." + data_tests: + - elementary.ai_data_validation: + arguments: + expectation_prompt: "The discount percentage should be between 0 and 50, and should only be a whole number." + llm_model_name: "claude-3-5-sonnet" + config: + severity: warn +``` + +```yml Example - Complex Validation +version: 2 + +models: + - name: customer_accounts + description: "A table containing customer account information." + columns: + - name: account_status + description: "The current status of the customer account." + data_tests: + - elementary.ai_data_validation: + arguments: + expectation_prompt: "The account status should be one of: 'active', 'inactive', 'suspended', or 'pending'. If the account is 'suspended', there should be a reason code in the suspension_reason column." + llm_model_name: "gemini-1.5-pro" +``` + + + diff --git a/docs/data-tests/ai-data-tests/supported-platforms/bigquery.mdx b/docs/data-tests/ai-data-tests/supported-platforms/bigquery.mdx new file mode 100644 index 000000000..0eb9464ef --- /dev/null +++ b/docs/data-tests/ai-data-tests/supported-platforms/bigquery.mdx @@ -0,0 +1,107 @@ +--- +title: "BigQuery Vertex AI" +description: "Learn how to configure BigQuery to use Vertex AI models for unstructured data validation tests" +--- + +# BigQuery Setup for Unstructured Data Tests + +Elementary's unstructured data validation tests leverage BigQuery ML and Vertex AI models to perform advanced AI-powered validations. This guide will walk you through the setup process. + +## Prerequisites + +Before you begin, ensure you have: +- A Google Cloud account with appropriate permissions +- Access to BigQuery and Vertex AI services +- A BigQuery dataset where you'll create your model, that will be used by Elementary's data validation tests. This is the dataset where you have unstructured data stored and that you want to apply validations on. + +## Step 1: Enable the Vertex AI API + +1. Navigate to the Google Cloud Console +2. Go to **APIs & Services** > **API Library** +3. Search for "Vertex AI API" +4. Click on the API and select **Enable** + +## Step 2: Create a Remote Connection to Vertex AI + +Elementary's unstructured data validation tests use BigQuery ML to access pre-trained Vertex AI models. To establish this connection: + +1. Navigate to the Google Cloud Console > **BigQuery** +2. In the Explorer panel, click the **+** button +3. Select **Connections to external data sources** +4. Change the connection type to **Vertex AI remote models, remote functions and BigLake (Cloud Resource)** +5. Select the appropriate region: + - If your model and dataset are in the same region, select that specific region + - Otherwise, select multi-region + +After creating the connection: +1. In the BigQuery Explorer, navigate to **External Connections** +2. Find and click on your newly created connection +3. Copy the **Service Account ID** for the next step + +## Step 3: Grant Vertex AI Access Permissions + +Now you need to give the connection's service account permission to access Vertex AI: + +1. In the Google Cloud Console, go to **IAM & Admin** +2. Click **+ Grant Access** +3. Under "New principals", paste the service account ID you copied +4. Assign the **Vertex AI User** role +5. Click **Save** + +## Step 4: Create an LLM Model Interface in BigQuery + +1. In the BigQuery Explorer, navigate to **External Connections** +2. Find again your newly created connection from previous step and clikc on it +3. Copy the **Connection ID** (format: `projects//locations//connections/`) +4. [Select a model endpoint](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-remote-model#gemini-api-multimodal-models). You can use `gemini-1.5-pro-002` as a default endpoint. +5. Run the following SQL query to create a model in your dataset: + +```sql +CREATE OR REPLACE MODEL + `..` +REMOTE WITH CONNECTION + `` +OPTIONS ( + endpoint = '' +); +``` + +### Example + +```sql +CREATE OR REPLACE MODEL + `my-project.my-dataset.gemini-1.5-pro` +REMOTE WITH CONNECTION + `projects/my-project/locations/us/connections/my-remote-connection-model-name` +OPTIONS ( + endpoint = 'gemini-1.5-pro-002' +); +``` + +> **Note:** During development, we used `gemini-1.5-pro` and recommend it as the default model for unstructured data tests in BigQuery. + +### Additional Resources + +- [Available models and endpoints](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-remote-model#gemini-api-multimodal-models) +- [Documentation on creating remote models](https://cloud.google.com/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-remote-model) + +## Step 5: Running an Unstructured Data Test + +Once your model is set up, you can reference it in your Elementary tests: + +```yaml +models: + - name: table_with_unstructured_data + description: "A table containing unstructured text data." + columns: + - name: text_data + description: "Unstructured text data stored as a string." + data_tests: + - elementary.validate_unstructured_data: + arguments: + expectation_prompt: "The text data should represent an example of unstructured data." + llm_model_name: "gemini-1.5-pro" +``` + + + diff --git a/docs/data-tests/ai-data-tests/supported-platforms/data-lakes.mdx b/docs/data-tests/ai-data-tests/supported-platforms/data-lakes.mdx new file mode 100644 index 000000000..7d7035c5f --- /dev/null +++ b/docs/data-tests/ai-data-tests/supported-platforms/data-lakes.mdx @@ -0,0 +1,7 @@ +--- +title: "Data lakes" +--- + +Currently, you can apply Elementary's unstructured data validation tests on data lakes using Snowflake, Databricks, or BigQuery external object tables. + +Native and direct support for data lakes is coming soon. Please reach out if you would like to discuss this integration and use case. \ No newline at end of file diff --git a/docs/data-tests/ai-data-tests/supported-platforms/databricks.mdx b/docs/data-tests/ai-data-tests/supported-platforms/databricks.mdx new file mode 100644 index 000000000..41211db58 --- /dev/null +++ b/docs/data-tests/ai-data-tests/supported-platforms/databricks.mdx @@ -0,0 +1,35 @@ +--- +title: "Databricks AI Functions" +--- + +# Setting Up Databricks AI Functions + +Elementary unstructured data validation tests run on top of Databricks AI Functions for Databricks users. +This guide provides details on the prerequisites to use Databricks AI Functions. + +## What are Databricks AI Functions? + +Databricks AI Functions are built-in SQL functions that allow you to apply AI capabilities directly to your data using SQL. These functions enable you to leverage large language models and other AI capabilities without complex setup or external dependencies, making them ideal for data validation tests. + +## Availability and Prerequisites + +To use Databricks AI Functions, your environment must meet the following requirements: + +### Runtime Requirements +- **Recommended**: Databricks Runtime 15.3 or above for optimal performance + +### Environment Requirements +- Your workspace must be in a supported Model Serving region. +- For Pro SQL warehouses, AWS PrivateLink must be enabled. +- Databricks SQL does support AI functions but Databricks SQL Classic does not support it. + +### Models +Databricks AI functions can run on foundation models hosted in Databricks, external foundation models (like OpenAI's models) and custom models. +Currently Elementary's unstructured data validations support only foundation models hosted in Databricks. Adding support for external and custom models is coming soon. +> **Note**: While developing the tests we worked with `databricks-meta-llama-3-3-70b-instruct` so we recommend using this model as a default when running unstructured data validation tests in Databricks. + + +## Region Considerations + +When using AI functions, be aware that some models are limited to specific regions (US and EU). Make sure your Databricks workspace is in a supported region for the Databricks AI functions. + diff --git a/docs/data-tests/ai-data-tests/supported-platforms/redshift.mdx b/docs/data-tests/ai-data-tests/supported-platforms/redshift.mdx new file mode 100644 index 000000000..25392a4b2 --- /dev/null +++ b/docs/data-tests/ai-data-tests/supported-platforms/redshift.mdx @@ -0,0 +1,7 @@ +--- +title: "Redshift" +--- + +Elementary's unstructured data validation tests do not currently support Redshift. + +On Redshift setting up LLM functions is more complex and requires deploying a lambda function to call external LLM models. Documentation and support for this integration is coming soon. Please reach out if you'd like to discuss this use case and integration options. \ No newline at end of file diff --git a/docs/data-tests/ai-data-tests/supported-platforms/snowflake.mdx b/docs/data-tests/ai-data-tests/supported-platforms/snowflake.mdx new file mode 100644 index 000000000..c93b1b669 --- /dev/null +++ b/docs/data-tests/ai-data-tests/supported-platforms/snowflake.mdx @@ -0,0 +1,70 @@ +--- +title: "Snowflake Cortex AI" +--- + +# Snowflake Cortex AI LLM Functions + +This guide provides instructions on how to enable Snowflake Cortex AI LLM functions, which is a prerequisite for running Elementary unstructured data validation tests on Snowflake. + +## What is Snowflake Cortex? + +Snowflake Cortex is a fully managed service that brings cutting-edge AI and ML solutions directly into your Snowflake environment. It allows you to leverage the power of large language models (LLMs) without any complex setup or external dependencies. +Snowflake provides LLMs that are fully hosted and managed by Snowflake, using them requires no setup and your data stays within Snowflake. + + +## Cross-Region Model Usage + +> **Important**: It is always better to use models in the same region as your dataset to avoid errors and optimize performance. + +To learn where each model is located we recommend checking this [models list](https://docs.snowflake.com/en/user-guide/snowflake-cortex/llm-functions#availability). +If you encounter a "model not found" error, it may be because the model you're trying to use is not available in your current region. In such cases, you can enable cross-region model access with the following command (requires ACCOUNTADMIN privileges): + +```sql +-- Enable access to models in any region +ALTER ACCOUNT SET CORTEX_ENABLED_CROSS_REGION = 'ANY_REGION'; +``` + +This setting allows your account to use models from any region, which can be helpful when the model you need is not available in your current region. However, be aware that cross-region access may impact performance and could have additional cost implications. + + +## Supported LLM Models + +Snowflake Cortex provides access to various industry-leading LLM models with different capabilities and context lengths. Here are the key models available: + +### Native Snowflake Models + +* **Snowflake Arctic**: An open enterprise-grade model developed by Snowflake, optimized for business use cases. + +### External Models (Hosted within Snowflake) + +* **Claude Models (Anthropic)**: High-capability models for complex reasoning tasks. +* **Mistral Models**: Including mistral-large, mixtral-8x7b, and mistral-7b for various use cases. +* **Llama Models (Meta)**: Including llama3.2-1b, llama3.2-3b, llama3.1-8b, and llama2-70b-chat. +* **Gemma Models (Google)**: Including gemma-7b for code and text completion tasks. + +> **Note**: While developing the tests we worked with `claude-3-5-sonnet` so we recommend using this model as a default when running unstructured data tests in Snowflake. + +## Permissions + +> **Note**: By default, all users in your Snowflake account already have access to Cortex AI LLM functions through the PUBLIC role. In most cases, you don't need to do anything to enable access. + +The `CORTEX_USER` database role in the SNOWFLAKE database includes all the privileges needed to call Snowflake Cortex LLM functions. This role is automatically granted to the PUBLIC role, which all users have by default. + +The following commands are **only needed if** your administrator has revoked the default access from the PUBLIC role or if you need to set up specific access controls. If you can already use Cortex functions, you can skip this section. + +```sql +-- Run as ACCOUNTADMIN +USE ROLE ACCOUNTADMIN; + +-- Create a dedicated role for Cortex users +CREATE ROLE cortex_user_role; + +-- Grant the database role to the custom role +GRANT DATABASE ROLE SNOWFLAKE.CORTEX_USER TO ROLE cortex_user_role; + +-- Grant the role to specific users +GRANT ROLE cortex_user_role TO USER ; + +-- Optionally, grant warehouse access to the role +GRANT USAGE ON WAREHOUSE TO ROLE cortex_user_role; +``` \ No newline at end of file diff --git a/docs/data-tests/ai-data-tests/unstructured_data_validations.mdx b/docs/data-tests/ai-data-tests/unstructured_data_validations.mdx new file mode 100644 index 000000000..e6192524d --- /dev/null +++ b/docs/data-tests/ai-data-tests/unstructured_data_validations.mdx @@ -0,0 +1,259 @@ +--- +title: "Unstructured Data Validations" +--- + + + **Beta Feature**: Unstructured data validation tests is currently in beta. The functionality and interface may change in future releases. + + **Version Requirement**: This feature requires Elementary dbt package version 0.18.0 or above. + + +# Validating Unstructured Data with Elementary + +## What is Unstructured Data Validation? + +Elementary's `elementary.unstructured_data_validation` test allows you to validate unstructured data using AI and LLM language models. Instead of writing complex code, you can simply describe what you expect from your data in plain English, and Elementary will check if your data meets those expectations. + +For example, you can verify that customer feedback comments are in English, product descriptions contain required information, or support tickets follow a specific format or a sentiment. + +## How It Works + +Elementary leverages the AI and LLM capabilities built directly into your data warehouse. When you run a validation test: + +1. Your unstructured data stays within your data warehouse +2. The warehouse's built-in AI and LLM functions analyze the data +3. Elementary reports whether each text value meets your expectations + +## Required Setup for Each Data Warehouse + +Before you can use Elementary's unstructured data validations, you need to set up AI and LLM capabilities in your data warehouse: + +### Snowflake +- **Prerequisite**: Enable Snowflake Cortex AI LLM functions +- **Recommended Model**: `claude-3-5-sonnet` +- [View Snowflake's Guide](/data-tests/ai-data-tests/supported-platforms/snowflake) + +### Databricks +- **Prerequisite**: Ensure Databricks AI Functions are available +- **Recommended Model**: `databricks-meta-llama-3-3-70b-instruct` +- [View Databrick's Setup Guide](/data-tests/ai-data-tests/supported-platforms/databricks) + +### BigQuery +- **Prerequisite**: Configure BigQuery to use Vertex AI models +- **Recommended Model**: `gemini-1.5-pro` +- [View BigQuery's Setup Guide](/data-tests/ai-data-tests/supported-platforms/bigquery) + +### Redshift +- Support coming soon + +### Data Lakes +- Currently supported through Snowflake, Databricks, or BigQuery external object tables +- [View Data Lakes Information](/data-tests/ai-data-tests/supported-platforms/data-lakes) + + +## Using the Validation Test + +The test requires two main parameters: +- `expectation_prompt`: Describe what you expect from the text in plain English +- `llm_model_name`: Specify which AI model to use (see recommendations above for each warehouse) + + + This test works with any column containing unstructured text data such as descriptions, comments, or other free-form text fields. It can also be applied to structured columns that can be converted to strings, enabling natural language data validations. + + + + +```yml Models +version: 2 + +models: + - name: < model name > + columns: + - name: < column name > + data_tests: + - elementary.unstructured_data_validation: + arguments: + expectation_prompt: "Description of what the text should contain or represent" + llm_model_name: "model_name" +``` + +```yml Example +version: 2 + +models: + - name: table_with_unstructured_data + description: "A table containing unstructured text data." + columns: + - name: text_data + description: "Unstructured text data stored as a string." + data_tests: + - elementary.unstructured_data_validation: + arguments: + expectation_prompt: "The text data should represent an example of unstructured data." + llm_model_name: "test_model" +``` + +```yml Example - Validating Customer Feedback +version: 2 + +models: + - name: customer_feedback + description: "A table containing customer feedback comments." + columns: + - name: feedback_text + description: "Customer feedback in free text format." + data_tests: + - elementary.unstructured_data_validation: + arguments: + expectation_prompt: "The text should be a customer feedback comment in English, it should describe only a bug or a feature request." + llm_model_name: "claude-3-5-sonnet" + config: + severity: warn +``` + + + + +## Usage Examples + +Here are some powerful ways you can apply unstructured data validations: + +### Validating Structure + +```yml +models: + - name: medicine_prescriptions + description: "A table containing medicine prescriptions." + columns: + - name: doctor_notes + description: "A column containing the doctor notes on the prescription" + data_tests: + - elementary.unstructured_data_validation: + arguments: + expectation_prompt: "The prescription has to include a limited time period and recommendations to the patient" + llm_model_name: "claude-3-5-sonnet" +``` + +Test fails if: A doctor's note does not specify a time period or lacks recommendations for the patient. + +### Validating Sentiment + +```yml +models: + - name: customer_feedback + description: "A table containing customer feedback." + columns: + - name: negative_feedbacks + description: "A column containing negative feedbacks about our product." + data_tests: + - elementary.unstructured_data_validation: + arguments: + expectation_prompt: "The customer feedback's sentiment has to be negative" + llm_model_name: "claude-3-5-sonnet" +``` + +Test fails if: Any feedback in `negative_feedbacks` is not actually negative. + +### Validating Similarities Coming Soon + +```yml +models: + - name: summarized_pdfs + description: "A table containing a summary of our ingested PDFs." + columns: + - name: pdf_summary + description: "A column containing the main PDF's content summary." + data_tests: + - elementary.validate_similarity: + arguments: + to: ref('pdf_source_table') + column: pdf_content + match_by: pdf_name +``` + +Test fails if: A PDF summary does not accurately represent the original PDF's content. The validation will use the pdf name as the key to match a summary from the pdf_summary table to the pdf_content in the pdf_source_table. + +```yml +models: + - name: jobs + columns: + - name: job_title + data_tests: + - elementary.validate_similarity: + arguments: + column: job_description +``` + +Test fails if: The job title does not align with the job description. + +### Accepted Categories Coming Soon + +```yml +models: + - name: support_tickets + description: "A table containing customer support tickets." + columns: + - name: issue_description + description: "A column containing customer-reported issues." + data_tests: + - elementary.accepted_categories: + arguments: + categories: ['billing', 'technical_support', 'account_access', 'other'] +``` + +Test fails if: A support ticket does not fall within the predefined categories. + +### Accepted Entities Coming Soon + +```yml +models: + - name: news_articles + description: "A table containing news articles." + columns: + - name: article_text + description: "A column containing full article text." + data_tests: + - elementary.extract_and_validate_entities: + arguments: + entities: + organization: + required: true + accepted_values: ['Google', 'Amazon', 'Microsoft', 'Apple'] + location: + required: false + accepted_values: {{ run_query('select zip_code from locations') }} +``` + +Test fails if: +- The required entity (e.g., `organization`) is missing. +- Extracted entities do not match the expected values. + +### Compare Numeric Values Coming Soon + +```yml +models: + - name: board_meeting_summaries + description: "A table containing board meeting summary texts." + columns: + - name: meeting_notes + description: "A column containing the full summary of the board meeting." + data_tests: + - elementary.extract_and_validate_numbers: + arguments: + entities: + revenue: + compare_with: ref('crm_financials') + column: sum(revenue) + required: true + net_profit: + compare_with: ref('crm_financials') + column: sum(net_profit) + customer_count: + compare_with: ref('crm_customers') + column: count(customers) + required: true +``` + +Test fails if: +- Required entities are missing +- The numerical entities do not match the structured CRM data \ No newline at end of file diff --git a/docs/data-tests/anomaly-detection-configuration/anomaly-direction.mdx b/docs/data-tests/anomaly-detection-configuration/anomaly-direction.mdx index d15870626..315e05348 100644 --- a/docs/data-tests/anomaly-detection-configuration/anomaly-direction.mdx +++ b/docs/data-tests/anomaly-detection-configuration/anomaly-direction.mdx @@ -3,8 +3,14 @@ title: "anomaly_direction" sidebarTitle: "anomaly_direction" --- +import AiGenerateTest from '/snippets/ai-generate-test.mdx'; + + + + `anomaly_direction: both | spike | drop` + By default, data points are compared to the expected range and check if these are below or above it. For some data monitors, you might only want to flag anomalies if they are above the range and not under it, and vice versa. For example - when monitoring for freshness, we only want to detect data delays and not data that is “early”. @@ -26,16 +32,18 @@ The anomaly_direction configuration is used to configure the direction of the ex ```yml test models: - name: this_is_a_model - tests: + data_tests: - elementary.volume_anomalies: - anomaly_direction: drop + arguments: + anomaly_direction: drop - elementary.all_columns_anomalies: - column_anomalies: - - null_count - - missing_count - - zero_count - anomaly_direction: spike + arguments: + column_anomalies: + - null_count + - missing_count + - zero_count + anomaly_direction: spike ``` ```yml model diff --git a/docs/data-tests/anomaly-detection-configuration/anomaly-exclude-metrics.mdx b/docs/data-tests/anomaly-detection-configuration/anomaly-exclude-metrics.mdx index 8ccad9c8e..58ffaebee 100644 --- a/docs/data-tests/anomaly-detection-configuration/anomaly-exclude-metrics.mdx +++ b/docs/data-tests/anomaly-detection-configuration/anomaly-exclude-metrics.mdx @@ -3,6 +3,12 @@ title: "anomaly_exclude_metrics" sidebarTitle: "anomaly_exclude_metrics" --- +import AiGenerateTest from '/snippets/ai-generate-test.mdx'; + + + + + `anomaly_exclude_metrics: [SQL where expression on fields metric_date / metric_time_bucket / metric_value]` By default, data points are compared to the all data points in the training set. diff --git a/docs/data-tests/anomaly-detection-configuration/anomaly-params.mdx b/docs/data-tests/anomaly-detection-configuration/anomaly-params.mdx index 4a9951d8b..c8c4eb1f3 100644 --- a/docs/data-tests/anomaly-detection-configuration/anomaly-params.mdx +++ b/docs/data-tests/anomaly-detection-configuration/anomaly-params.mdx @@ -3,6 +3,12 @@ title: Anomaly Tests configuration params sidebarTitle: "All configuration params" --- +import AiGenerateTest from '/snippets/ai-generate-test.mdx'; + + + + + If your data set has a timestamp column that represents the creation time of a field, it is highly recommended configuring it as a `timestamp_column`. @@ -28,6 +34,7 @@ sidebarTitle: "All configuration params" -- detection_period: int     period: [hour | day | week | month]     count: int + -- exclude_detection_period_from_training: [true | false] -- time_bucket:     period: [hour | day | week | month]     count: int @@ -48,7 +55,7 @@ sidebarTitle: "All configuration params" -- exclude_regexp: regex dimension_anomalies test: - -- exclude_final_results: [SQL where expression on fields value / average] + -- exclude_final_results: [SQL where expression on fields value / average] event_freshness_anomalies: -- event_timestamp_column: column name @@ -60,7 +67,7 @@ sidebarTitle: "All configuration params" -```yml properties.yml +```yml Models version: 2 models: @@ -68,14 +75,14 @@ models: config: elementary: timestamp_column: < model timestamp column > - tests: < here you will add elementary monitors as tests > + data_tests: < here you will add elementary monitors as tests > - name: ## if no timestamp is configured, elementary will monitor without time filtering - tests: + data_tests: ``` -```yml Example +```yml Models example version: 2 models: @@ -83,54 +90,60 @@ models: config: elementary: timestamp_column: updated_at - tests: + data_tests: - elementary.freshness_anomalies: - tags: ["elementary"] + config: + tags: ["elementary"] - elementary.all_columns_anomalies: - tags: ["elementary"] + config: + tags: ["elementary"] - name: users ## if no timestamp is configured, elementary will monitor without time filtering - tests: + data_tests: - elementary.volume_anomalies: - tags: ["elementary"] + config: + tags: ["elementary"] ``` -```yml sources_properties.yml +```yml Sources +version: 2 + sources: - name: < some name > database: < database > schema: < schema > tables: - name: < table_name > - ## sources don't have config, so elementary config is placed under 'meta' - meta: + config: elementary: timestamp_column: < source timestamp column > - tests: + data_tests: ``` -```yml Example +```yml Sources example +version: 2 + sources: - name: "my_non_dbt_table" database: "raw_events" schema: "product" tables: - name: "raw_product_login_events" - ## sources don't have config, so elementary config is placed under 'meta' - meta: + config: elementary: timestamp_column: "loaded_at" - tests: + data_tests: - elementary.volume_anomalies - elementary.all_columns_anomalies: - column_anomalies: - - null_count - - missing_count - - zero_count + arguments: + column_anomalies: + - null_count + - missing_count + - zero_count columns: - name: user_id - tests: + data_tests: - elementary.column_anomalies ``` diff --git a/docs/data-tests/anomaly-detection-configuration/anomaly-sensitivity.mdx b/docs/data-tests/anomaly-detection-configuration/anomaly-sensitivity.mdx index 3c4b1eb2b..304007167 100644 --- a/docs/data-tests/anomaly-detection-configuration/anomaly-sensitivity.mdx +++ b/docs/data-tests/anomaly-detection-configuration/anomaly-sensitivity.mdx @@ -3,7 +3,12 @@ title: "anomaly_sensitivity" sidebarTitle: "anomaly_sensitivity" --- -`anomaly_sensitivity: [int]` +import AiGenerateTest from '/snippets/ai-generate-test.mdx'; + + + + +`anomaly_sensitivity: [float]` Configuration to define how the expected range is calculated. A sensitivity of 3 means that the expected range is within 3 standard deviations from the average of the training set. @@ -25,16 +30,18 @@ Larger values will have the opposite effect and will reduce the number of anomal ```yml test models: - name: this_is_a_model - tests: + data_tests: - elementary.volume_anomalies: - anomaly_sensitivity: 2.5 + arguments: + anomaly_sensitivity: 2.5 - elementary.all_columns_anomalies: - column_anomalies: - - null_count - - missing_count - - zero_count - anomaly_sensitivity: 4 + arguments: + column_anomalies: + - null_count + - missing_count + - zero_count + anomaly_sensitivity: 4 ``` ```yml model diff --git a/docs/data-tests/anomaly-detection-configuration/column-anomalies.mdx b/docs/data-tests/anomaly-detection-configuration/column-anomalies.mdx index 0eb68820e..00ff63d03 100644 --- a/docs/data-tests/anomaly-detection-configuration/column-anomalies.mdx +++ b/docs/data-tests/anomaly-detection-configuration/column-anomalies.mdx @@ -3,6 +3,12 @@ title: "column_anomalies" sidebarTitle: "column_anomalies" --- +import AiGenerateTest from '/snippets/ai-generate-test.mdx'; +import ColumnMetrics from '/snippets/column-metrics.mdx'; + + + + `column_anomalies: [column monitors list]` Select which monitors to activate as part of the test. @@ -16,16 +22,17 @@ Select which monitors to activate as part of the test. ```yml test models: - name: this_is_a_model - tests: + data_tests: - elementary.column_anomalies: - column_anomalies: - - null_count - - missing_count - - average + arguments: + column_anomalies: + - null_count + - missing_count + - average ``` #### Supported column monitors - + diff --git a/docs/data-tests/anomaly-detection-configuration/detection-delay.mdx b/docs/data-tests/anomaly-detection-configuration/detection-delay.mdx index 0818c11bd..bee53f0b3 100644 --- a/docs/data-tests/anomaly-detection-configuration/detection-delay.mdx +++ b/docs/data-tests/anomaly-detection-configuration/detection-delay.mdx @@ -3,6 +3,11 @@ title: "detection_delay" sidebarTitle: "detection_delay" --- +import AiGenerateTest from '/snippets/ai-generate-test.mdx'; + + + + ``` detection_delay: period: < time period > # supported periods: hour, day, week, month @@ -22,11 +27,12 @@ That's useful in cases which the latest data should be excluded from the test. F ```yml test models: - name: this_is_a_model - tests: + data_tests: - elementary.volume_anomalies: - detection_delay: - period: day - count: 1 + arguments: + detection_delay: + period: day + count: 1 ``` ```yml model diff --git a/docs/data-tests/anomaly-detection-configuration/detection-period.mdx b/docs/data-tests/anomaly-detection-configuration/detection-period.mdx index ecd79171c..c3ffa9564 100644 --- a/docs/data-tests/anomaly-detection-configuration/detection-period.mdx +++ b/docs/data-tests/anomaly-detection-configuration/detection-period.mdx @@ -3,6 +3,11 @@ title: "detection_period" sidebarTitle: "detection_period" --- +import AiGenerateTest from '/snippets/ai-generate-test.mdx'; + + + + ``` detection_period: period: < time period > # supported periods: day, week, month @@ -27,11 +32,12 @@ This configuration should be changed according to your data delays. ```yml test models: - name: this_is_a_model - tests: + data_tests: - elementary.volume_anomalies: - detection_period: - period: day - count: 30 + arguments: + detection_period: + period: day + count: 30 ``` ```yml model @@ -61,3 +67,7 @@ It works differently according to the table materialization: - **Regular tables and views** - `detection_period` defines the detection period. - **Incremental models and sources** - `detection_period` defines the detection period, and the period for which metrics will be re-calculated. + +**Overlap with training period:** + +When `detection_period` spans multiple time buckets (e.g., 7 days), it can overlap with the training period. By default, values in the detection period are included in the training calculation, which can lead to false negatives. To prevent this overlap, use [`exclude_detection_period_from_training`](/data-tests/anomaly-detection-configuration/exclude_detection_period_from_training) set to `true`. diff --git a/docs/data-tests/anomaly-detection-configuration/dimensions.mdx b/docs/data-tests/anomaly-detection-configuration/dimensions.mdx index 748ae262f..1c94ad13a 100644 --- a/docs/data-tests/anomaly-detection-configuration/dimensions.mdx +++ b/docs/data-tests/anomaly-detection-configuration/dimensions.mdx @@ -3,6 +3,11 @@ title: "dimensions" sidebarTitle: "dimensions" --- +import AiGenerateTest from '/snippets/ai-generate-test.mdx'; + + + + `dimensions: [list of SQL expressions]` The test will group the results by a given column / columns / valid select sql expression. @@ -27,11 +32,12 @@ models: config: elementary: timestamp_column: updated_at - tests: + data_tests: - elementary.dimension_anomalies: - dimensions: - - device_os - - device_browser + arguments: + dimensions: + - device_os + - device_browser ``` diff --git a/docs/data-tests/anomaly-detection-configuration/event_timestamp_column.mdx b/docs/data-tests/anomaly-detection-configuration/event_timestamp_column.mdx index f5b33353d..3720ee2fc 100644 --- a/docs/data-tests/anomaly-detection-configuration/event_timestamp_column.mdx +++ b/docs/data-tests/anomaly-detection-configuration/event_timestamp_column.mdx @@ -3,6 +3,11 @@ title: "event_timestamp_column" sidebarTitle: "event_timestamp_column" --- +import AiGenerateTest from '/snippets/ai-generate-test.mdx'; + + + + `event_timestamp_column: [column name]` Configuration for the test `event_freshness_anomalies`. @@ -22,10 +27,11 @@ The test can work in a couple of modes: ```yml test models: - name: this_is_a_model - tests: + data_tests: - elementary.event_timestamp_column: - event_timestamp_column: "event_timestamp" - update_timestamp_column: "created_at" + arguments: + event_timestamp_column: "event_timestamp" + update_timestamp_column: "created_at" ``` diff --git a/docs/data-tests/anomaly-detection-configuration/exclude-final-results.mdx b/docs/data-tests/anomaly-detection-configuration/exclude-final-results.mdx index 3152cf031..741d3d3f0 100644 --- a/docs/data-tests/anomaly-detection-configuration/exclude-final-results.mdx +++ b/docs/data-tests/anomaly-detection-configuration/exclude-final-results.mdx @@ -3,6 +3,11 @@ title: "exclude_final_results" sidebarTitle: "exclude_final_results" --- +import AiGenerateTest from '/snippets/ai-generate-test.mdx'; + + + + `exclude_final_results: [SQL where expression on fields value / average]` Failures in dimension anomaly tests consist of outliers in row count of each dimension. @@ -23,12 +28,13 @@ models: config: elementary: timestamp_column: updated_at - tests: + data_tests: - elementary.dimension_anomalies: - dimensions: - - device_os - - device_browser - exclude_final_results: 'value > 1000 or average > 10' + arguments: + dimensions: + - device_os + - device_browser + exclude_final_results: 'value > 1000 or average > 10' ``` diff --git a/docs/data-tests/anomaly-detection-configuration/exclude_detection_period_from_training.mdx b/docs/data-tests/anomaly-detection-configuration/exclude_detection_period_from_training.mdx new file mode 100644 index 000000000..cf73cbd11 --- /dev/null +++ b/docs/data-tests/anomaly-detection-configuration/exclude_detection_period_from_training.mdx @@ -0,0 +1,55 @@ +--- +title: "exclude_detection_period_from_training" +sidebarTitle: "exclude_detection_period_from_training" +--- + +import AiGenerateTest from '/snippets/ai-generate-test.mdx'; + + + + +`exclude_detection_period_from_training: true | false` + +When the detection period spans multiple values, there can be overlap between the training period and the detection period. By default, values in the detection period are included in the training calculation, which can lead to false negatives because the detection period values influence the expected range used to evaluate those same values. + +Setting `exclude_detection_period_from_training: true` ensures that no values from the detection period are used in the training calculation, preventing this overlap and improving anomaly detection accuracy. + +**Example use case:** + +When `detection_period` is set to more than 1 time bucket (e.g., `detection_period: 7 days`), the detection period overlaps with the training period. Without excluding the detection period from training, values being evaluated for anomalies are also contributing to the expected range calculation, which can mask actual anomalies and result in false negatives. + +- _Default: false_ +- _Supported values: `true`, `false`_ +- _Relevant tests: Anomaly detection tests with `timestamp_column` and `detection_period` greater than 1 time bucket_ + +#### How it works? + +- When `exclude_detection_period_from_training: false` (default), all values within both the training period and detection period are used to calculate the expected range. +- When `exclude_detection_period_from_training: true`, values within the detection period are excluded from the training calculation, ensuring the expected range is based solely on historical data that is not being evaluated. + + + +```yml test +models: + - name: this_is_a_model + data_tests: + - elementary.volume_anomalies: + arguments: + exclude_detection_period_from_training: true +``` + +```yml model +models: + - name: this_is_a_model + config: + elementary: + exclude_detection_period_from_training: true +``` + +```yml dbt_project.yml +vars: + exclude_detection_period_from_training: true +``` + + + diff --git a/docs/data-tests/anomaly-detection-configuration/exclude_prefix.mdx b/docs/data-tests/anomaly-detection-configuration/exclude_prefix.mdx index 56109822b..d2a6ae7a2 100644 --- a/docs/data-tests/anomaly-detection-configuration/exclude_prefix.mdx +++ b/docs/data-tests/anomaly-detection-configuration/exclude_prefix.mdx @@ -3,6 +3,11 @@ title: "exclude_prefix" sidebarTitle: "exclude_prefix" --- +import AiGenerateTest from '/snippets/ai-generate-test.mdx'; + + + + `exclude_prefix: [string]` Param for the `all_columns_anomalies` test only, which enables to exclude a column from the tests based on prefix match. @@ -16,9 +21,10 @@ Param for the `all_columns_anomalies` test only, which enables to exclude a colu ```yml test models: - name: this_is_a_model - tests: + data_tests: - elementary.column_anomalies: - exclude_prefix: "id_" + arguments: + exclude_prefix: "id_" ``` diff --git a/docs/data-tests/anomaly-detection-configuration/exclude_regexp.mdx b/docs/data-tests/anomaly-detection-configuration/exclude_regexp.mdx index 02f27769b..ed12b7516 100644 --- a/docs/data-tests/anomaly-detection-configuration/exclude_regexp.mdx +++ b/docs/data-tests/anomaly-detection-configuration/exclude_regexp.mdx @@ -3,6 +3,11 @@ title: "exclude_regexp" sidebarTitle: "exclude_regexp" --- +import AiGenerateTest from '/snippets/ai-generate-test.mdx'; + + + + `exclude_regexp: [regex]` Param for the `all_columns_anomalies` test only, which enables to exclude a column from the tests based on regular expression match. @@ -16,9 +21,10 @@ Param for the `all_columns_anomalies` test only, which enables to exclude a colu ```yml test models: - name: this_is_a_model - tests: + data_tests: - elementary.column_anomalies: - exclude_regexp: ".*SDC$" + arguments: + exclude_regexp: ".*SDC$" ``` diff --git a/docs/data-tests/anomaly-detection-configuration/fail_on_zero.mdx b/docs/data-tests/anomaly-detection-configuration/fail_on_zero.mdx index 6c99a99aa..c7dc3d536 100644 --- a/docs/data-tests/anomaly-detection-configuration/fail_on_zero.mdx +++ b/docs/data-tests/anomaly-detection-configuration/fail_on_zero.mdx @@ -3,6 +3,11 @@ title: "fail_on_zero" sidebarTitle: "fail_on_zero" --- +import AiGenerateTest from '/snippets/ai-generate-test.mdx'; + + + + `fail_on_zero: true/false` Elementary anomaly detection tests will fail if there is a zero metric value within the detection period. @@ -16,9 +21,10 @@ If undefined, default is false. ```yml test models: - name: this_is_a_model - tests: + data_tests: - elementary.volume_anomalies: - fail_on_zero: true + arguments: + fail_on_zero: true ``` ```yml model diff --git a/docs/data-tests/anomaly-detection-configuration/ignore_small_changes.mdx b/docs/data-tests/anomaly-detection-configuration/ignore_small_changes.mdx index 1b2200560..81a4a23eb 100644 --- a/docs/data-tests/anomaly-detection-configuration/ignore_small_changes.mdx +++ b/docs/data-tests/anomaly-detection-configuration/ignore_small_changes.mdx @@ -3,6 +3,11 @@ title: "ignore_small_changes" sidebarTitle: "ignore_small_changes" --- +import AiGenerateTest from '/snippets/ai-generate-test.mdx'; + + + + ``` ignore_small_changes: spike_failure_percent_threshold: [int] @@ -28,11 +33,12 @@ If undefined, default is null for both spike and drop. ```yml test models: - name: this_is_a_model - tests: + data_tests: - elementary.volume_anomalies: - ignore_small_changes: - spike_failure_percent_threshold: 2 - drop_failure_percent_threshold: 50 + arguments: + ignore_small_changes: + spike_failure_percent_threshold: 2 + drop_failure_percent_threshold: 50 ``` ```yml model @@ -50,7 +56,7 @@ sources: schema: raw tables: - name: source_table - meta: + config: elementary: ignore_small_changes: drop_failure_percent_threshold: 50 diff --git a/docs/data-tests/anomaly-detection-configuration/seasonality.mdx b/docs/data-tests/anomaly-detection-configuration/seasonality.mdx index 446a24107..0f0014e83 100644 --- a/docs/data-tests/anomaly-detection-configuration/seasonality.mdx +++ b/docs/data-tests/anomaly-detection-configuration/seasonality.mdx @@ -3,6 +3,11 @@ title: "seasonality" sidebarTitle: "seasonality" --- +import AiGenerateTest from '/snippets/ai-generate-test.mdx'; + + + + `seasonality: day_of_week | hour_of_day | hour_of_week` Some data sets have patterns that repeat over a time period, and are expected. @@ -33,7 +38,7 @@ The expected range for Monday will be based on a training set of previous Monday #### How it works? - The test will compare the value of a bucket to previous bucket with the same seasonality attribute, and not to the adjacent previous data points. -- The `training_period` of the test will be changed by default to assure a minimal training set. When `seasonality: day_of_week` is configured, `training_period` is by default multiplied by 7. +- **The `training_period` of the test will be changed by default to assure a minimal training set. When `seasonality: day_of_week` is configured, `training_period` is by default multiplied by 7.** @@ -48,9 +53,10 @@ The expected range for Monday will be based on a training set of previous Monday ```yml test models: - name: this_is_a_model - tests: + data_tests: - elementary.volume_anomalies: - seasonality: day_of_week + arguments: + seasonality: day_of_week ``` ```yml model diff --git a/docs/data-tests/anomaly-detection-configuration/time-bucket.mdx b/docs/data-tests/anomaly-detection-configuration/time-bucket.mdx index 0981d6676..51762e3de 100644 --- a/docs/data-tests/anomaly-detection-configuration/time-bucket.mdx +++ b/docs/data-tests/anomaly-detection-configuration/time-bucket.mdx @@ -3,6 +3,11 @@ title: "time_bucket" sidebarTitle: "time_bucket" --- +import AiGenerateTest from '/snippets/ai-generate-test.mdx'; + + + + ``` time_bucket: period: < time period > # supported periods: hour, day, week, month @@ -32,11 +37,12 @@ For example, if you want to detect volume anomalies in an hourly resolution, you ```yml test models: - name: this_is_a_model - tests: + data_tests: - elementary.volume_anomalies: - time_bucket: - period: day - count: 2 + arguments: + time_bucket: + period: day + count: 2 ``` ```yml model diff --git a/docs/data-tests/anomaly-detection-configuration/timestamp-column.mdx b/docs/data-tests/anomaly-detection-configuration/timestamp-column.mdx index 5b2431220..4e3f80136 100644 --- a/docs/data-tests/anomaly-detection-configuration/timestamp-column.mdx +++ b/docs/data-tests/anomaly-detection-configuration/timestamp-column.mdx @@ -3,6 +3,11 @@ title: "timestamp_column" sidebarTitle: "timestamp_column" --- +import AiGenerateTest from '/snippets/ai-generate-test.mdx'; + + + + `timestamp_column: [column name]` @@ -29,9 +34,10 @@ If undefined, default is null (no time buckets). ```yml test models: - name: this_is_a_model - tests: + data_tests: - elementary.volume_anomalies: - timestamp_column: created_at + arguments: + timestamp_column: created_at ``` ```yml model @@ -48,7 +54,7 @@ sources: schema: raw tables: - name: source_table - meta: + config: elementary: timestamp_column: loaded_at ``` diff --git a/docs/data-tests/anomaly-detection-configuration/training-period.mdx b/docs/data-tests/anomaly-detection-configuration/training-period.mdx index 395d85a07..a43724458 100644 --- a/docs/data-tests/anomaly-detection-configuration/training-period.mdx +++ b/docs/data-tests/anomaly-detection-configuration/training-period.mdx @@ -3,6 +3,11 @@ title: "training_period" sidebarTitle: "training_period" --- +import AiGenerateTest from '/snippets/ai-generate-test.mdx'; + + + + ``` training_period: period: < time period > # supported periods: day, week, month @@ -22,11 +27,12 @@ This timeframe includes the training period and detection period. If a detection ```yml test models: - name: this_is_a_model - tests: + data_tests: - elementary.volume_anomalies: - training_period: - period: day - count: 30 + arguments: + training_period: + period: day + count: 30 ``` ```yml model @@ -62,6 +68,10 @@ It works differently according to the table materialization: - **Full time buckets** - Elementary will increase the `training_period` automatically to insure full time buckets. For example if the `time_bucket` of the test is `period: week`, and 14 days `training_period` result in Tuesday, the test will collect 2 more days back to complete a week (starting on Sunday). - **Seasonality training set** - If seasonality is configured, Elementary will increase the `training_period` automatically to ensure there are enough training set values to calculate an anomaly. For example if the `seasonality` of the test is `day_of_week`, `training_period` will be increased to ensure enough Sundays, Mondays, Tuesdays, etc. to calculate an anomaly for each. +**Overlap with detection period:** + +When the `detection_period` spans multiple time buckets, it can overlap with the training period. By default, values in the detection period are included in the training calculation, which can lead to false negatives because detection period values influence the expected range used to evaluate those same values. To prevent this overlap and improve anomaly detection accuracy, use [`exclude_detection_period_from_training`](/data-tests/anomaly-detection-configuration/exclude_detection_period_from_training) set to `true`. + #### The impact of changing `training_period` If you **increase `training_period`** your test training set will be larger. This means a larger sample size for calculating the expected range, which should make the test less sensitive to outliers. This means less chance of false positive anomalies, but also less sensitivity so anomalies have a higher threshold. diff --git a/docs/data-tests/anomaly-detection-configuration/update_timestamp_column.mdx b/docs/data-tests/anomaly-detection-configuration/update_timestamp_column.mdx index e7068b1af..f4bc0ee79 100644 --- a/docs/data-tests/anomaly-detection-configuration/update_timestamp_column.mdx +++ b/docs/data-tests/anomaly-detection-configuration/update_timestamp_column.mdx @@ -3,6 +3,11 @@ title: "update_timestamp_column" sidebarTitle: "update_timestamp_column" --- +import AiGenerateTest from '/snippets/ai-generate-test.mdx'; + + + + `update_timestamp_column: [column name]` Configuration for the test `event_freshness_anomalies`. @@ -22,10 +27,11 @@ The test can work in a couple of modes: ```yml test models: - name: this_is_a_model - tests: + data_tests: - elementary.event_timestamp_column: - event_timestamp_column: "event_timestamp" - update_timestamp_column: "created_at" + arguments: + event_timestamp_column: "event_timestamp" + update_timestamp_column: "created_at" ``` diff --git a/docs/data-tests/anomaly-detection-configuration/where-expression.mdx b/docs/data-tests/anomaly-detection-configuration/where-expression.mdx index b3400bf8a..470f68e4a 100644 --- a/docs/data-tests/anomaly-detection-configuration/where-expression.mdx +++ b/docs/data-tests/anomaly-detection-configuration/where-expression.mdx @@ -3,6 +3,11 @@ title: "where_expression" sidebarTitle: "where_expression" --- +import AiGenerateTest from '/snippets/ai-generate-test.mdx'; + + + + `where_expression: [sql expression]` Filter the tested data using a valid sql expression. @@ -10,14 +15,21 @@ Filter the tested data using a valid sql expression. - _Default: None_ - _Relevant tests: All anomaly detection tests_ +The `where_expression` is used to filter out the data that you want to test. For example, to only test weekdays for anomalies, you can filter on the day of the week: +```yml +where_expression: EXTRACT(DOW FROM timestamp_column) BETWEEN 2 AND 6 +#Test Monday through Friday +``` + ```yml test models: - name: this_is_a_model - tests: + data_tests: - elementary.volume_anomalies: - where_expression: "user_name != 'test'" + arguments: + where_expression: "user_name != 'test'" ``` ```yml model diff --git a/docs/data-tests/anomaly-detection-tests-oss-vs-cloud.mdx b/docs/data-tests/anomaly-detection-tests-oss-vs-cloud.mdx new file mode 100644 index 000000000..d5a12b10e --- /dev/null +++ b/docs/data-tests/anomaly-detection-tests-oss-vs-cloud.mdx @@ -0,0 +1,37 @@ +--- +title: "OSS vs Cloud Anomaly Detection" +--- + +Elementary OSS and Elementary Cloud Platform both offer data anomaly detection. However, there are significant differences in implementation. + +There are two types of anomaly detection tests: + +* **Pipeline health monitors** - Monitor the pipeline runs, ensuring timely and complete data ingestion and transformation. These monitors monitor metadata to detect volume and freshness issues. + +* **Data quality metrics tests** - Run as part of the pipeline, collect metrics by querying the data itself. These include various data quality metrics such as nullness, cardinality, average, length, etc. + +Here is a comparison between the implementation of these tests in Elementary Cloud and OSS: + +## Pipeline Health Monitors - Freshness and Volume + +| | OSS | Cloud | +| ----------------------- | --------------------------------------------- | ------------------------------------------------- | +| **Implementation** | dbt tests | Elementary Cloud monitors | +| **Tests execution** | Run in dbt | Run in Cloud | +| **Coverage** | Manually added in code | Automated, out-of-the-box full coverage | +| **Configuration** | Manual, many parameters required for accuracy | No configuration, automated ML models | +| **Detection mechanism** | Z-score, statistical | ML anomaly detection, various models | +| **What is monitored?** | Data | Metadata (query history, information schema) | +| **Time to detection** | Only when dbt runs | As soon as the problem happens, including sources | +| **Cost** | DWH compute | No cost, only metadata is leveraged | + +## Data Quality Metrics + +| | OSS | Cloud | +| ----------------------- | --------------------------------------------- | ---------------------------------------------------- | +| **Implementation** | dbt tests | Metrics collection in dbt, Elementary Cloud monitors | +| **Tests execution** | Run in dbt | Metrics collection in dbt, detection in Cloud | +| **Coverage** | Manually added in code | Opt-in, can be added in bulk in Cloud | +| **Configuration** | Manual, many parameters required for accuracy | Automated ML models | +| **Detection mechanism** | Z-score, statistical | ML anomaly detection, various models | +| **What is monitored?** | Data | Data | diff --git a/docs/data-tests/anomaly-detection-tests/Anomaly-troubleshooting-guide.mdx b/docs/data-tests/anomaly-detection-tests/Anomaly-troubleshooting-guide.mdx new file mode 100644 index 000000000..200b5f884 --- /dev/null +++ b/docs/data-tests/anomaly-detection-tests/Anomaly-troubleshooting-guide.mdx @@ -0,0 +1,166 @@ +--- +title: "Anomaly Tests Troubleshooting" +sidebarTitle: "Anomaly tests troubleshooting" +--- + +import AiGenerateTest from '/snippets/ai-generate-test.mdx'; + + + + + + +First, check if your test uses a timestamp column: + +```yaml +# In your YAML configuration +data_tests: + - elementary.volume_anomalies: + arguments: + timestamp_column: created_at# If this is configured, you have a timestamp-based test +``` + + + + - Metrics are calculated by grouping data into time buckets (default: 'day') + - Detection period (default: 2 days) determines how many buckets are being tested + - Training period data (default: 14 days) comes from historical buckets, allowing immediate anomaly detection with sufficient history + + Verify data collection: + + ```sql +-- Check if metrics are being collected in time buckets +SELECT + bucket_end, + metric_value, + COUNT(*) as metrics_per_bucket +FROM your_schema.data_monitoring_metrics +WHERE full_table_name = 'your_table' +GROUP BY bucket_end, metric_value +ORDER BY bucket_end DESC; + + ``` + + - Each bucket should represent one time bucket (e.g., daily metrics) + - Gaps in `metric_timestamp` might indicate data collection issues + - Training uses historical buckets for anomaly detection + - The format for full_table_name is DATABASE.SCHEMA.TABLE_NAME + + **Common collection issues:** + + - Missing or null values in timestamp column + - Timestamp column not in expected format + - No data in specified training period + + + + + + - Training period data builds up over multiple test runs, using the test run time as its timestamp column. This requires time to collect enough points; for a 14 day training period, the test would need 14 different runs on different days to have a full training set. + - Metrics are calculated for the entire table in each test run + - Detection period (default: 2 days) determines how many buckets are being tested + + Check metric collection across test runs: + + ```sql +-- Check metrics from different test runs +SELECT + updated_at, + metric_value +FROM your_schema.data_monitoring_metrics +WHERE full_table_name = 'your_table' +ORDER BY updated_at DESC; + ``` + + - Should see one metric per test run and per dimension + - Training requires multiple test runs over time + - Each new test run creates the training point for a time bucket. A second test run within the same bucket will override the first one. + - The format for full_table_name is DATABASE.SCHEMA.TABLE_NAME + + + **Common collection issues:** + + - Test hasn't run enough times + - Previous test runs failed + - Metrics not being saved between runs + + + + + + +Anomaly detection is influenced by: + +- Detection period (default: 2 days) - the time window being tested +- Sensitivity (default: 3.0) - how many standard deviations from normal before flagging +- Training data from previous periods/runs +- `metrics_anomaly_score` calculates the anomaly based on the data in `data_monitoring metrics`. + +Check calculations in `metrics_anomaly_score`: + +```sql +-- Check how anomalies are being calculated +SELECT + metric_name, + latest_metric_value, + training_avg, + training_stddev, + anomaly_score, + is_anomaly +FROM your_schema.metrics_anomaly_score +WHERE full_table_name = 'your_table' +ORDER BY bucket_end DESC; +``` +- `anomaly_score`: The standardized score that measures how many standard deviations a data point is from the mean +- `is_anomaly`: A boolean field that indicates whether the anomaly score exceeds the configured threshold + + + + + +This occurs when there are fewer than 7 training data points. To resolve: + +### For timestamp-based tests: + +- Check if your timestamp column has enough historical data +- Verify time buckets are being created correctly in `data_monitoring_metrics` +- Look for gaps in your data that might affect bucket creation + +### For non-timestamp tests: + +- Run your tests multiple times to build up training data. +- Check `data_monitoring_metrics` to verify the data collection. The test will need data for at least 7 time buckets (e.g 7 days) to calculate the anomaly. + + + + + +If your test isn't appearing in `data_monitoring_metrics`: + +Verify test configuration: + +```yaml +data_tests: + - elementary.volume_anomalies: + arguments: + timestamp_column: created_at# Check if specified correctly +``` + +### Common causes: + +- Incorrect timestamp column name +- Timestamp column contains null values or is not of type timestamp or date +- For non-timestamp tests: Test hasn't run successfully +- Incorrect test syntax + + + + +If you change it after executing elementary tests, you will need to run a full refresh to the metrics collected. This will make the next tests collect data for the new **`training_period`** timeframe. The steps are: + +1. Change var **`training_period`** in your **`dbt_project.yml`**. +2. Full refresh of the model ‘data_monitoring_metrics’ by running **`dbt run --select data_monitoring_metrics --full-refresh`**. +3. Running the elementary tests again. + +If you want the Elementary UI to show data for a longer period of time, use the days-back option of the CLI: **`edr report --days-back 45`** + diff --git a/docs/data-tests/anomaly-detection-tests/all-columns-anomalies.mdx b/docs/data-tests/anomaly-detection-tests/all-columns-anomalies.mdx index 546ea6ebb..3aadbb9a3 100644 --- a/docs/data-tests/anomaly-detection-tests/all-columns-anomalies.mdx +++ b/docs/data-tests/anomaly-detection-tests/all-columns-anomalies.mdx @@ -3,6 +3,13 @@ sidebarTitle: "All columns anomalies" title: "all_columns_anomalies" --- +import AiGenerateTest from '/snippets/ai-generate-test.mdx'; +import ColumnMetrics from '/snippets/column-metrics.mdx'; + + + + + `elementary.all_columns_anomalies` Executes column level monitors and anomaly detection on all the columns of the table. @@ -11,7 +18,7 @@ Specific monitors are detailed in the table below and can be configured using th The test checks the data type of each column and only executes monitors that are relevant to it. You can use `column_anomalies` param to override the default monitors, and `exclude_prefix` / `exclude_regexp` to exclude columns from the test. - + ### Test configuration @@ -20,33 +27,34 @@ No mandatory configuration, however it is highly recommended to configure a `tim {/* prettier-ignore */}
  
-  tests:
+  data_tests:
       -- elementary.all_columns_anomalies:
-          timestamp_column: column name
-          column_anomalies: column monitors list
-          dimensions: sql expression
-          exclude_prefix: string
-          exclude_regexp: regex
-          where_expression: sql expression
-          anomaly_sensitivity: int
-          anomaly_direction: [both | spike | drop]
-          detection_period:
-            period: [hour | day | week | month]
-            count: int
-          training_period:
-            period: [hour | day | week | month]
-            count: int
-          time_bucket:
-            period: [hour | day | week | month]
-            count: int
-          seasonality: day_of_week
-          detection_delay:
-            period: [hour | day | week | month]
-            count: int
-          ignore_small_changes:
-            spike_failure_percent_threshold: int
-            drop_failure_percent_threshold: int
-          anomaly_exclude_metrics: [SQL expression]
+          arguments:
+            timestamp_column: column name
+            column_anomalies: column monitors list
+            dimensions: sql expression
+            exclude_prefix: string
+            exclude_regexp: regex
+            where_expression: sql expression
+            anomaly_sensitivity: int
+            anomaly_direction: [both | spike | drop]
+            detection_period:
+              period: [hour | day | week | month]
+              count: int
+            training_period:
+              period: [hour | day | week | month]
+              count: int
+            time_bucket:
+              period: [hour | day | week | month]
+              count: int
+            seasonality: day_of_week
+            detection_delay:
+              period: [hour | day | week | month]
+              count: int
+            ignore_small_changes:
+              spike_failure_percent_threshold: int
+              drop_failure_percent_threshold: int
+            anomaly_exclude_metrics: [SQL expression]
  
 
@@ -58,13 +66,14 @@ models: config: elementary: timestamp_column: < timestamp column > - tests: + data_tests: - elementary.all_columns_anomalies: - column_anomalies: < specific monitors, all if null > - where_expression: < sql expression > - time_bucket: # Daily by default - period: < time period > - count: < number of periods > + arguments: + column_anomalies: < specific monitors, all if null > + where_expression: < sql expression > + time_bucket: # Daily by default + period: < time period > + count: < number of periods > ``` ```yml Models example @@ -73,15 +82,18 @@ models: config: elementary: timestamp_column: "loaded_at" - tests: + data_tests: - elementary.all_columns_anomalies: - where_expression: "event_type in ('event_1', 'event_2') and country_name != 'unwanted country'" - time_bucket: - period: day - count: 1 - tags: ["elementary"] - # optional - change global sensitivity - anomaly_sensitivity: 3.5 + config: + tags: ["elementary"] + arguments: + where_expression: "event_type in ('event_1', 'event_2') and country_name != 'unwanted country'" + time_bucket: + period: day + count: 1 + + # optional - change global sensitivity + anomaly_sensitivity: 3.5 ```
diff --git a/docs/data-tests/anomaly-detection-tests/column-anomalies.mdx b/docs/data-tests/anomaly-detection-tests/column-anomalies.mdx index fd88dbab9..b352ae037 100644 --- a/docs/data-tests/anomaly-detection-tests/column-anomalies.mdx +++ b/docs/data-tests/anomaly-detection-tests/column-anomalies.mdx @@ -3,6 +3,12 @@ title: "column_anomalies" sidebarTitle: "Column anomalies" --- +import AiGenerateTest from '/snippets/ai-generate-test.mdx'; +import ColumnMetrics from '/snippets/column-metrics.mdx'; + + + + `elementary.column_anomalies` Executes column level monitors and anomaly detection on the column. @@ -10,7 +16,7 @@ Specific monitors are detailed in the table below and can be configured using th The test checks the data type of the column and only executes monitors that are relevant to it. - + ### Test configuration @@ -19,31 +25,32 @@ No mandatory configuration, however it is highly recommended to configure a `tim {/* prettier-ignore */}
  
-  tests:
+  data_tests:
       -- elementary.column_anomalies:
-          column_anomalies: column monitors list
-          dimensions: sql expression
-          timestamp_column: column name
-          where_expression: sql expression
-          anomaly_sensitivity: int
-          anomaly_direction: [both | spike | drop]
-          detection_period:
-            period: [hour | day | week | month]
-            count: int
-          training_period:
-            period: [hour | day | week | month]
-            count: int
-          time_bucket:
-            period: [hour | day | week | month]
-            count: int
-          seasonality: day_of_week
-          detection_delay:
-            period: [hour | day | week | month]
-            count: int
-          ignore_small_changes:
-            spike_failure_percent_threshold: int
-            drop_failure_percent_threshold: int
-          anomaly_exclude_metrics: [SQL expression]
+          arguments:
+              column_anomalies: column monitors list
+              dimensions: sql expression
+              timestamp_column: column name
+              where_expression: sql expression
+              anomaly_sensitivity: int
+              anomaly_direction: [both | spike | drop]
+              detection_period:
+                period: [hour | day | week | month]
+                count: int
+              training_period:
+                period: [hour | day | week | month]
+                count: int
+              time_bucket:
+                period: [hour | day | week | month]
+                count: int
+              seasonality: day_of_week
+              detection_delay:
+                period: [hour | day | week | month]
+                count: int
+              ignore_small_changes:
+                spike_failure_percent_threshold: int
+                drop_failure_percent_threshold: int
+              anomaly_exclude_metrics: [SQL expression]
  
 
@@ -57,22 +64,24 @@ models: timestamp_column: < timestamp column > columns: - name: < column name > - tests: + data_tests: - elementary.column_anomalies: - column_anomalies: < specific monitors, all if null > - where_expression: < sql expression > - time_bucket: # Daily by default - period: < time period > - count: < number of periods > + arguments: + column_anomalies: < specific monitors, all if null > + where_expression: < sql expression > + time_bucket: # Daily by default + period: < time period > + count: < number of periods > - name: < model name > ## if no timestamp is configured, elementary will monitor without time filtering columns: - name: < column name > - tests: + data_tests: - elementary.column_anomalies: - column_anomalies: < specific monitors, all if null > - where_expression: < sql expression > + arguments: + column_anomalies: < specific monitors, all if null > + where_expression: < sql expression > ``` ```yml Models example @@ -84,39 +93,46 @@ models: timestamp_column: 'loaded_at' columns: - name: user_name - tests: + data_tests: - elementary.column_anomalies: - column_anomalies: - - missing_count - - min_length - where_expression: "event_type in ('event_1', 'event_2') and country_name != 'unwanted country'" - time_bucket: - period: day - count: 1 - tags: ['elementary'] + arguments: + column_anomalies: + - missing_count + - min_length + where_expression: "event_type in ('event_1', 'event_2') and country_name != 'unwanted country'" + time_bucket: + period: day + count: 1 + config: + tags: ['elementary'] - name: users ## if no timestamp is configured, elementary will monitor without time filtering - tests: + data_tests: elementary.volume_anomalies - tags: ['elementary'] + config: + tags: ['elementary'] columns: - name: user_id - tests: + data_tests: - elementary.column_anomalies: - tags: ['elementary'] - timestamp_column: 'updated_at' - where_expression: "event_type in ('event_1', 'event_2') and country_name != 'unwanted country'" - time_bucket: - period: < time period > - count: < number of periods > + config: + tags: ['elementary'] + arguments: + timestamp_column: 'updated_at' + where_expression: "event_type in ('event_1', 'event_2') and country_name != 'unwanted country'" + time_bucket: + period: < time period > + count: < number of periods > - name: user_name - tests: + data_tests: - elementary.column_anomalies: - column_anomalies: - - missing_count - - min_length - tags: ['elementary'] + arguments: + column_anomalies: + - missing_count + - min_length + config: + tags: ['elementary'] ``` diff --git a/docs/data-tests/anomaly-detection-tests/dimension-anomalies.mdx b/docs/data-tests/anomaly-detection-tests/dimension-anomalies.mdx index b3de4c39c..bc1ab4700 100644 --- a/docs/data-tests/anomaly-detection-tests/dimension-anomalies.mdx +++ b/docs/data-tests/anomaly-detection-tests/dimension-anomalies.mdx @@ -3,6 +3,12 @@ sidebarTitle: "Dimension anomalies" title: "dimension_anomalies" --- +import AiGenerateTest from '/snippets/ai-generate-test.mdx'; + + + + + `elementary.dimension_anomalies` The test counts rows grouped by given `dimensions` (columns/expressions). @@ -19,31 +25,32 @@ _Required configuration: `dimensions`_ {/* prettier-ignore */}
  
-  tests:
+  data_tests:
       -- elementary.dimension_anomalies:
-          dimensions: sql expression
-          timestamp_column: column name
-          where_expression: sql expression
-          anomaly_sensitivity: int
-          anomaly_direction: [both | spike | drop]
-          detection_period:
-            period: [hour | day | week | month]
-            count: int
-          training_period:
-            period: [hour | day | week | month]
-            count: int
-          time_bucket:
-            period: [hour | day | week | month]
-            count: int
-          seasonality: day_of_week
-          detection_delay:
-            period: [hour | day | week | month]
-            count: int
-          ignore_small_changes:
-            spike_failure_percent_threshold: int
-            drop_failure_percent_threshold: int
-          anomaly_exclude_metrics: [SQL expression]
-          exclude_final_results: [SQL expression]
+          arguments:
+              dimensions: sql expression
+              timestamp_column: column name
+              where_expression: sql expression
+              anomaly_sensitivity: int
+              anomaly_direction: [both | spike | drop]
+              detection_period:
+                period: [hour | day | week | month]
+                count: int
+              training_period:
+                period: [hour | day | week | month]
+                count: int
+              time_bucket:
+                period: [hour | day | week | month]
+                count: int
+              seasonality: day_of_week
+              detection_delay:
+                period: [hour | day | week | month]
+                count: int
+              ignore_small_changes:
+                spike_failure_percent_threshold: int
+                drop_failure_percent_threshold: int
+              anomaly_exclude_metrics: [SQL expression]
+              exclude_final_results: [SQL expression]
  
 
@@ -55,14 +62,15 @@ models: config: elementary: timestamp_column: < timestamp column > - tests: + data_tests: - elementary.dimension_anomalies: - dimensions: < columns or sql expressions of columns > - # optional - configure a where a expression to accurate the dimension monitoring - where_expression: < sql expression > - time_bucket: # Daily by default - period: < time period > - count: < number of periods > + arguments: + dimensions: < columns or sql expressions of columns > + # optional - configure a where a expression to accurate the dimension monitoring + where_expression: < sql expression > + time_bucket: # Daily by default + period: < time period > + count: < number of periods > ``` ```yml Models example @@ -71,28 +79,31 @@ models: config: elementary: timestamp_column: "loaded_at" - tests: + data_tests: - elementary.dimension_anomalies: - dimensions: - - event_type - - country_name - where_expression: "event_type in ('event_1', 'event_2') and country_name != 'unwanted country'" - time_bucket: - period: hour - count: 4 - # optional - use tags to run elementary tests on a dedicated run - tags: ["elementary"] + arguments: + dimensions: + - event_type + - country_name + where_expression: "event_type in ('event_1', 'event_2') and country_name != 'unwanted country'" + time_bucket: + period: hour + count: 4 config: + # optional - use tags to run elementary tests on a dedicated run + tags: ["elementary"] # optional - change severity severity: warn - name: users # if no timestamp is configured, elementary will monitor without time filtering - tests: + data_tests: - elementary.dimension_anomalies: - dimensions: - - event_type - tags: ["elementary"] + arguments: + dimensions: + - event_type + config: + tags: ["elementary"] ``` diff --git a/docs/data-tests/anomaly-detection-tests/event-freshness-anomalies.mdx b/docs/data-tests/anomaly-detection-tests/event-freshness-anomalies.mdx index 52f026239..675a347c0 100644 --- a/docs/data-tests/anomaly-detection-tests/event-freshness-anomalies.mdx +++ b/docs/data-tests/anomaly-detection-tests/event-freshness-anomalies.mdx @@ -3,6 +3,11 @@ sidebarTitle: "Event freshness anomalies" title: "event_freshness_anomalies" --- +import AiGenerateTest from '/snippets/ai-generate-test.mdx'; + + + + `elementary.event_freshness_anomalies` Monitors the freshness of event data over time, as the expected time it takes each event to load - @@ -26,29 +31,30 @@ _Default configuration: `anomaly_direction: spike` to alert only on delays._ {/* prettier-ignore */}
  
-  tests:
+  data_tests:
       -- elementary.event_freshness_anomalies:
-          event_timestamp_column: column name
-          update_timestamp_column: column name
-          where_expression: sql expression
-          anomaly_sensitivity: int
-          detection_period:
-            period: [hour | day | week | month]
-            count: int
-          training_period:
-            period: [hour | day | week | month]
-            count: int
-          time_bucket:
-            period: [hour | day | week | month]
-            count: int
-          seasonality: day_of_week
-          detection_delay:
-            period: [hour | day | week | month]
-            count: int
-          ignore_small_changes:
-            spike_failure_percent_threshold: int
-            drop_failure_percent_threshold: int
-          anomaly_exclude_metrics: [SQL expression]
+          arguments:
+              event_timestamp_column: column name
+              update_timestamp_column: column name
+              where_expression: sql expression
+              anomaly_sensitivity: int
+              detection_period:
+                period: [hour | day | week | month]
+                count: int
+              training_period:
+                period: [hour | day | week | month]
+                count: int
+              time_bucket:
+                period: [hour | day | week | month]
+                count: int
+              seasonality: day_of_week
+              detection_delay:
+                period: [hour | day | week | month]
+                count: int
+              ignore_small_changes:
+                spike_failure_percent_threshold: int
+                drop_failure_percent_threshold: int
+              anomaly_exclude_metrics: [SQL expression]
  
 
@@ -57,26 +63,28 @@ _Default configuration: `anomaly_direction: spike` to alert only on delays._ ```yml Models models: - name: < model name > - tests: + data_tests: - elementary.event_freshness_anomalies: - event_timestamp_column: < timestamp column > # Mandatory - update_timestamp_column: < timestamp column > # Optional - where_expression: < sql expression > - time_bucket: # Daily by default - period: < time period > - count: < number of periods > + arguments: + event_timestamp_column: < timestamp column > # Mandatory + update_timestamp_column: < timestamp column > # Optional + where_expression: < sql expression > + time_bucket: # Daily by default + period: < time period > + count: < number of periods > ``` ```yml Models example models: - name: login_events - tests: + data_tests: - elementary.event_freshness_anomalies: - event_timestamp_column: "occurred_at" - update_timestamp_column: "updated_at" - # optional - use tags to run elementary tests on a dedicated run - tags: ["elementary"] + arguments: + event_timestamp_column: "occurred_at" + update_timestamp_column: "updated_at" config: + # optional - use tags to run elementary tests on a dedicated run + tags: ["elementary"] # optional - change severity severity: warn ``` diff --git a/docs/data-tests/anomaly-detection-tests/freshness-anomalies.mdx b/docs/data-tests/anomaly-detection-tests/freshness-anomalies.mdx index 8215ebfa8..108f0b5ef 100644 --- a/docs/data-tests/anomaly-detection-tests/freshness-anomalies.mdx +++ b/docs/data-tests/anomaly-detection-tests/freshness-anomalies.mdx @@ -3,12 +3,17 @@ title: "freshness_anomalies" sidebarTitle: "Freshness anomalies" --- +import AiGenerateTest from '/snippets/ai-generate-test.mdx'; + + + + `elementary.freshness_anomalies` Monitors the freshness of your table over time, as the expected time between data updates. Upon running the test, your data is split into time buckets (daily by default, configurable with the `time bucket` field), -and then we compute the maximum freshness value per bucket for the last `training_period` (by default 14 days). +and then we compute the maximum time without updates (in seconds) as the freshness value per bucket for the last `training_period` (by default 14 days). The test then compares the freshness of each bucket within the detection period (last 2 days by default, controlled by the `detection_period` var), and compares it to the freshness of the previous time buckets. @@ -22,27 +27,28 @@ _Default configuration: `anomaly_direction: spike` to alert only on delays._ {/* prettier-ignore */}
  
-  tests:
+  data_tests:
       -- elementary.freshness_anomalies:
-          timestamp_column: column name
-          where_expression: sql expression
-          anomaly_sensitivity: int
-          detection_period:
-            period: [hour | day | week | month]
-            count: int
-          training_period:
-            period: [hour | day | week | month]
-            count: int
-          time_bucket:
-            period: [hour | day | week | month]
-            count: int
-          detection_delay:
-            period: [hour | day | week | month]
-            count: int
-          ignore_small_changes:
-            spike_failure_percent_threshold: int
-            drop_failure_percent_threshold: int
-          anomaly_exclude_metrics: [SQL expression]
+          arguments:
+              timestamp_column: column name
+              where_expression: sql expression
+              anomaly_sensitivity: int
+              detection_period:
+                period: [hour | day | week | month]
+                count: int
+              training_period:
+                period: [hour | day | week | month]
+                count: int
+              time_bucket:
+                period: [hour | day | week | month]
+                count: int
+              detection_delay:
+                period: [hour | day | week | month]
+                count: int
+              ignore_small_changes:
+                spike_failure_percent_threshold: int
+                drop_failure_percent_threshold: int
+              anomaly_exclude_metrics: [SQL expression]
  
 
@@ -51,24 +57,26 @@ _Default configuration: `anomaly_direction: spike` to alert only on delays._ ```yml Models models: - name: < model name > - tests: + data_tests: - elementary.freshness_anomalies: - timestamp_column: < timestamp column > # Mandatory - where_expression: < sql expression > - time_bucket: # Daily by default - period: < time period > - count: < number of periods > + arguments: + timestamp_column: < timestamp column > # Mandatory + where_expression: < sql expression > + time_bucket: # Daily by default + period: < time period > + count: < number of periods > ``` ```yml Models example models: - name: login_events - tests: + data_tests: - elementary.freshness_anomalies: - timestamp_column: "updated_at" - # optional - use tags to run elementary tests on a dedicated run - tags: ["elementary"] + arguments: + timestamp_column: "updated_at" config: + # optional - use tags to run elementary tests on a dedicated run + tags: ["elementary"] # optional - change severity severity: warn ``` diff --git a/docs/data-tests/anomaly-detection-tests/volume-anomalies.mdx b/docs/data-tests/anomaly-detection-tests/volume-anomalies.mdx index b564514fc..724d564e4 100644 --- a/docs/data-tests/anomaly-detection-tests/volume-anomalies.mdx +++ b/docs/data-tests/anomaly-detection-tests/volume-anomalies.mdx @@ -3,6 +3,11 @@ title: "volume_anomalies" sidebarTitle: "Volume anomalies" --- +import AiGenerateTest from '/snippets/ai-generate-test.mdx'; + + + + `elementary.volume_anomalies` Monitors the row count of your table over time per time bucket (if configured without `timestamp_column`, will count table total rows). @@ -23,30 +28,31 @@ No mandatory configuration, however it is highly recommended to configure a `tim {/* prettier-ignore */}
  
-  tests:
+  data_tests:
       - elementary.volume_anomalies:
-          timestamp_column: column name
-          where_expression: sql expression
-          anomaly_sensitivity: int
-          anomaly_direction: [both | spike | drop]
-          detection_period:
-            period: [hour | day | week | month]
-            count: int
-          training_period:
-            period: [hour | day | week | month]
-            count: int
-          time_bucket:
-            period: [hour | day | week | month]
-            count: int
-          seasonality: day_of_week
-          fail_on_zero: [true | false]
-          ignore_small_changes:
-            spike_failure_percent_threshold: int
-            drop_failure_percent_threshold: int
-          detection_delay:
-            period: [hour | day | week | month]
-            count: int
-          anomaly_exclude_metrics: [SQL expression]
+          arguments:
+              timestamp_column: column name
+              where_expression: sql expression
+              anomaly_sensitivity: int
+              anomaly_direction: [both | spike | drop]
+              detection_period:
+                period: [hour | day | week | month]
+                count: int
+              training_period:
+                period: [hour | day | week | month]
+                count: int
+              time_bucket:
+                period: [hour | day | week | month]
+                count: int
+              seasonality: day_of_week
+              fail_on_zero: [true | false]
+              ignore_small_changes:
+                spike_failure_percent_threshold: int
+                drop_failure_percent_threshold: int
+              detection_delay:
+                period: [hour | day | week | month]
+                count: int
+              anomaly_exclude_metrics: [SQL expression]
  
 
@@ -55,13 +61,14 @@ No mandatory configuration, however it is highly recommended to configure a `tim ```yml Models models: - name: < model name > - tests: + data_tests: - elementary.volume_anomalies: - timestamp_column: < timestamp column > - where_expression: < sql expression > - time_bucket: # Daily by default - period: < time period > - count: < number of periods > + arguments: + timestamp_column: < timestamp column > + where_expression: < sql expression > + time_bucket: # Daily by default + period: < time period > + count: < number of periods > ``` ```yml Models example @@ -70,23 +77,25 @@ models: config: elementary: timestamp_column: "loaded_at" - tests: + data_tests: - elementary.volume_anomalies: - where_expression: "event_type in ('event_1', 'event_2') and country_name != 'unwanted country'" - time_bucket: - period: day - count: 1 - # optional - use tags to run elementary tests on a dedicated run - tags: ["elementary"] + arguments: + where_expression: "event_type in ('event_1', 'event_2') and country_name != 'unwanted country'" + time_bucket: + period: day + count: 1 config: + # optional - use tags to run elementary tests on a dedicated run + tags: ["elementary"] # optional - change severity severity: warn - name: users # if no timestamp is configured, elementary will monitor without time filtering - tests: + data_tests: - elementary.volume_anomalies: - tags: ["elementary"] + config: + tags: ["elementary"] ``` diff --git a/docs/data-tests/data-freshness-sla.mdx b/docs/data-tests/data-freshness-sla.mdx new file mode 100644 index 000000000..41c1c2574 --- /dev/null +++ b/docs/data-tests/data-freshness-sla.mdx @@ -0,0 +1,145 @@ +--- +title: "data_freshness_sla" +sidebarTitle: "Data Freshness SLA" +--- + +import AiGenerateTest from '/snippets/ai-generate-test.mdx'; + + + + +`elementary.data_freshness_sla` + +Verifies that data in a model was updated before a specified SLA deadline time. + +This test checks the maximum timestamp value of a specified column in your data to determine whether the data was actually refreshed before your deadline. Unlike `freshness_anomalies` (which uses z-score based anomaly detection as a dbt test, or ML-based detection in Elementary Cloud), this test validates against a fixed, explicit SLA time, making it ideal when you have a concrete contractual or operational deadline. + +Unlike `execution_sla` (which only checks if the dbt model _ran_ on time), `data_freshness_sla` checks whether the actual _data_ is fresh. A pipeline can run successfully but still serve stale data if, for example, an upstream source didn't update. This test catches that. + +### Use Case + +"Was the data in my model updated before 7 AM Pacific today?" + +### Test Logic + +1. If today is not a scheduled check day → **PASS** (skip) +2. Query the model for the maximum value of `timestamp_column` +3. If the max timestamp is from today → **PASS** (data is fresh) +4. If the SLA deadline hasn't passed yet → **PASS** (still time) +5. If the max timestamp is from a previous day → **FAIL** (DATA_STALE) +6. If no data exists in the table → **FAIL** (NO_DATA) + +### Test configuration + +_Required configuration: `timestamp_column`, `sla_time`, `timezone`_ + +{/* prettier-ignore */} +
+ 
+  data_tests:
+      -- elementary.data_freshness_sla:
+          arguments:
+              timestamp_column: column name # Required - timestamp column to check for freshness
+              sla_time: string # Required - e.g., "07:00", "7am", "2:30pm", "14:30"
+              timezone: string # Required - IANA timezone name, e.g., "America/Los_Angeles"
+              day_of_week: string | array # Optional - Day(s) to check: "Monday" or ["Monday", "Wednesday"]
+              day_of_month: int | array # Optional - Day(s) of month to check: 1 or [1, 15]
+              where_expression: sql expression # Optional - filter the data before checking
+ 
+
+ + + +```yml Models +models: + - name: < model name > + data_tests: + - elementary.data_freshness_sla: + arguments: + timestamp_column: < column name > # Required + sla_time: < deadline time > # Required - e.g., "07:00", "7am", "2:30pm" + timezone: < IANA timezone > # Required - e.g., "America/Los_Angeles" + day_of_week: < day or array > # Optional + day_of_month: < day or array > # Optional + where_expression: < sql expression > # Optional +``` + +```yml Daily check +models: + - name: daily_revenue + data_tests: + - elementary.data_freshness_sla: + arguments: + timestamp_column: updated_at + sla_time: "07:00" + timezone: "America/Los_Angeles" + config: + tags: ["elementary"] + severity: error +``` + +```yml With filter expression +models: + - name: daily_events + data_tests: + - elementary.data_freshness_sla: + arguments: + timestamp_column: event_timestamp + sla_time: "6am" + timezone: "Europe/Amsterdam" + where_expression: "event_type = 'completed'" + config: + tags: ["elementary"] +``` + +```yml Weekly - only Mondays +models: + - name: weekly_report_data + data_tests: + - elementary.data_freshness_sla: + arguments: + timestamp_column: report_date + sla_time: "09:00" + timezone: "Asia/Tokyo" + day_of_week: ["Monday"] + config: + tags: ["elementary"] +``` + + + +### Features + +- **Data-level freshness**: Checks actual data timestamps, not just pipeline execution time +- **Flexible time formats**: Supports `"07:00"`, `"7am"`, `"2:30pm"`, `"14:30"`, and other common formats +- **IANA timezone support**: Uses standard timezone names like `"America/Los_Angeles"`, `"Europe/Amsterdam"`, etc. +- **Automatic DST handling**: Uses `pytz` for timezone conversions with automatic daylight saving time handling +- **Database-agnostic**: All timezone logic happens at compile time +- **Schedule filters**: Optional `day_of_week` and `day_of_month` parameters to check only specific days +- **Filter support**: Use `where_expression` to check freshness of a specific subset of data + +### Parameters + +| Parameter | Required | Description | +| ------------------ | -------- | -------------------------------------------------------------- | +| `timestamp_column` | Yes | Column name containing timestamps to check for freshness | +| `sla_time` | Yes | Deadline time (e.g., `"07:00"`, `"7am"`, `"2:30pm"`) | +| `timezone` | Yes | IANA timezone name (e.g., `"America/Los_Angeles"`) | +| `day_of_week` | No | Day(s) to check: `"Monday"` or `["Monday", "Wednesday"]` | +| `day_of_month` | No | Day(s) of month to check: `1` or `[1, 15]` | +| `where_expression` | No | SQL expression to filter the data before checking | + +### Comparison with other freshness tests + +| Feature | `data_freshness_sla` | `freshness_anomalies` | `execution_sla` | +| --- | --- | --- | --- | +| What it checks | Actual data freshness (timestamps in the data) | Actual data freshness (timestamps in the data) | Pipeline execution (did the model run?) | +| Detection method | Fixed SLA deadline | Z-score (dbt test) / ML (Cloud) | Fixed SLA deadline | +| Best for | Contractual/operational deadlines on data | Detecting unexpected delays in data updates | Ensuring the pipeline itself ran on time | +| Works with sources | Yes | Yes | No (models only) | + +### Notes + +- The `timestamp_column` values are assumed to be in **UTC** (or timezone-naive timestamps that represent UTC). If your data stores local timestamps, the comparison may be incorrect. +- If both `day_of_week` and `day_of_month` are set, the test uses OR logic (checks if either matches) +- The test passes if the SLA deadline hasn't been reached yet, giving your data time to be updated diff --git a/docs/data-tests/dbt/dbt-artifacts.mdx b/docs/data-tests/dbt/dbt-artifacts.mdx new file mode 100644 index 000000000..e3c8bfdbd --- /dev/null +++ b/docs/data-tests/dbt/dbt-artifacts.mdx @@ -0,0 +1,44 @@ +--- +title: "dbt artifacts" +--- + +The [Elementary dbt package](cloud/quickstart#install-the-dbt-package) includes uploading and modeling of dbt artifacts. + + +Each dbt invocation generates artifacts, with details about the project resources, configuration and execution. Most are in json format. The artifacts are practically the metadata and logs of the dbt project. + + + +## How Elementary uploads dbt artifacts + +The Elementary dbt package includes macros that extract specific fields from the artifacts during the execution, and insert to tables. These tables are defined as models of the package. + +Some artifacts are updated by an [on-run-end](https://docs.getdbt.com/reference/project-configs/on-run-start-on-run-end) hook when you run `dbt run / test / build`, and some only when you run the models: + +- **On run end:** + - `elementary_test_results` - Results of all dbt tests (including elementary and other packages, such as dbt_expectations and dbt_utils). + - `dbt_run_results` - Results of all dbt executions. +- **`dbt run` that includes elementary models:** + - `dbt_models`, `dbt_tests`, `dbt_sources`, `dbt_exposures`, `dbt_metrics` - Metadata and configuration. It is recommended to run these models when you make changes in your projects. + +Read more about [Elementary on-run-end hooks](/data-tests/dbt/on-run-end_hooks) + +## Which artifacts are uploaded and modeled? + +Elementary uploads fields from the run-results, and the graph object (includes data of manifest, catalog and sources). + +You can disable or enable the relevant models in the Elementary package if you want to limit the artifacts that are loaded to your data warehouse (under `elementary/models/edr/dbt_artifacts`). + +## dbt artifacts models + +Elementary loads data from dbt artifacts to models that can be found under `dbt_artifacts` folder of the package. + +For `dbt_run_results` data is inserted at the on-run-end, and for the rest of the models it's a post-hook on the model itself. + +The dbt artifacts models include: + +- **Metadata tables** - `dbt_models`, `dbt_tests`, `dbt_sources`, `dbt_exposures`, `dbt_metrics`, `dbt_snapshots`, `dbt_seeds`, `dbt_columns`, `dbt_groups` - These provide a comprehensive view of your dbt project structure and configurations. Each time these models are executed, the data is replaced with data from the project's current graph. + +- **Run results tables** - `dbt_run_results`, `dbt_invocations`, `elementary_test_results`, `model_run_results`, `snapshot_run_results`, `seed_run_results`, `job_run_results`, `dbt_source_freshness_results` - These track execution details, test outcomes, and performance metrics from your dbt runs. + +For detailed schema documentation including all columns and their descriptions, see the [package models documentation](/data-tests/dbt/package-models). diff --git a/docs/data-tests/dbt/dbt-package.mdx b/docs/data-tests/dbt/dbt-package.mdx new file mode 100644 index 000000000..dd868994b --- /dev/null +++ b/docs/data-tests/dbt/dbt-package.mdx @@ -0,0 +1,58 @@ +--- +title: "Elementary dbt package" +sidebarTitle: "Introduction" +--- + +import QuestionSchemaNoAccordion from '/snippets/faq/question-schema-no-accordion.mdx'; + +The Elementary dbt package serves as a collector of logs and metadata from your dbt project and offers a set of data anomaly detection and schema tests. +To gain the most value from the dbt package, we recommend using it with the [Elementary Cloud Platform](/cloud/introduction) or with [Elementary open-source CLI tool](/oss/oss-introduction). + +Quickstart the Elementary dbt package in minutes [here](/data-tests/dbt/quickstart-package). The repository and source code of the can be [found here](https://github.com/elementary-data/dbt-data-reliability). + +## Package Features + +The Elementary dbt package is designed to power data observability use cases for dbt pipelines. +This package will upload logs and metadata generated from your runs as dbt artifacts into tables in your data warehouse. +Additionally, it offers a wide range of tests, including anomalies in volume, freshness, columns and different dimensions of your data. + +The impact of the package on **`dbt run`** is minimal, and most of the processing happens as part of the data tests that are executed on **`dbt test`**. + + + A dbt package is additional Jinja and SQL code that is added to your project, for additional functionality. In fact, each package is a dbt project. By adding a package to your project, you are adding the package code to be part of your project, you can reference its macros, execute its models, and so on. + + Add packages to your project by creating a `packages.yml` file under the main project directory (where your ` + dbt_project.yml` is), and adding the relevant package. After you add a new package, run `dbt deps` to actually pull its + code to your project. This is also how you update packages. + Some packages we recommend you check out: [dbt\_utils](https://github.com/dbt-labs/dbt-utils/tree/0.8.2/) + , [dbt\_date](https://github.com/calogica/dbt-date/tree/0.5.4/) + , [codegen](https://github.com/dbt-labs/dbt-codegen/tree/0.5.0/). + + +After you deploy the dbt package, you can use Elementary tests, and your dbt artifacts will be uploaded automatically with on-run-end hooks: + + + + + + + + + +## Elementary schema + + + +## Package full refresh + +Elementary incremental models are not full-refreshed by default. This is because these models contain +information such as historical runs and test results, which typically you'd want to maintain even when full refreshing +the models of your main dbt project. + +However, if you wish to change this behavior and include Elementary models as a part of the full refresh, you can set +the following var: + +```yaml +vars: + elementary_full_refresh: true +``` diff --git a/docs/data-tests/dbt/on-run-end_hooks.mdx b/docs/data-tests/dbt/on-run-end_hooks.mdx new file mode 100644 index 000000000..f72276e03 --- /dev/null +++ b/docs/data-tests/dbt/on-run-end_hooks.mdx @@ -0,0 +1,119 @@ +--- +title: "Elementary dbt package on-run-end hooks" +sidebarTitle: "on-run-end hooks" +--- + +Elementary dbt package uses `on-run-end` [hooks](https://docs.getdbt.com/reference/project-configs/on-run-start-on-run-end) to log results and metadata to tables in the Elementary schema. + +## Why Elementary uses `on-run-end` hooks? + +Elementary report and alerts are generated from the data in the Elementary schema. +The solution relies on the Elementary schema being up-to-date and complete to be able to provide reliable and accurate observability. + +By leveraging `on-run-end` hooks, we add a built-in collection of the latest results and metadata as part of your runs. +This means the results you see in Elementary report and the alerts you receive are full, up-to-date and accurate. + +We strongly recommend not to disable the hooks for environments you want to monitor using Elementary. + +## What happens on the `on-run-end` hooks? + +On the `on-run-end` hooks Elementary extracts data from the dbt `results` and `graph` objects, and runs SQL queries to load this data to the Elementary models. + +There are 2 types of models that Elementary updates : + +1. **Metadata models** - Such as `dbt_models`, `dbt_tests`, `dbt_sources`. +2. **Result models** - Such as `dbt_run_results`, `elementary_test_results`, `dbt_invocations`. + +#### Updates of metadata models + +These models store the current resources and configuration in your dbt projects (models, snapshots, sources, tests, etc.). +The metadata in the models only represents the project state on the latest run, so upon changes the metadata is replaced. +The `on-run-end` hook runs SQL queries with the new metadata and updates the relevant tables. + +#### Updates of result models + +These models store a log of results of dbt invocations, and of the specific executed resources. +The `on-run-end` hook runs SQL queries with the run results and invocation details. + +## Performance impact of `on-run-end` hooks + +We give a lot of thought and effort to making Elementary efficient in both cost and performance. +We only run the hooks that are relevant to each run, and each hook creates a minimal amount of queries possible. + +#### Metadata models + +**For `dbt 1.4.0` and above**, we maintain a metadata cache. This means each of these models are only updated with changes in your project (new model, change in config, etc.). +The first time you execute Elementary the initial update might take a while, but the following updates should be quick. + +**For `dbt 1.3.0` and lower**, these models would be fully updated on each run. +The performance impact depends on the size of your dbt project. + +**We strongly recommend Elementary users to use a dbt version of 1.4.0 or above.** + + + +Elementary implemented an "artifacts cache" to improve performance drastically. +A change to dbt-core was required to achieve that, which was only included in 1.4.0 release. +This means that if you upgrade to dbt 1.4.0 or above you will get a great improvement in Elementary hooks runtime. + +**If you can't upgrade, the alternative is** - + +1. Disable the auto uploading of artifacts. +2. Make sure to upload artifacts yourself anytime you make a change to the project (merge a PR). + +**How?** + +1. Add this to dbt_project.yml: + +```yaml dbt_project.yml +vars: + disable_dbt_artifacts_autoupload: true +``` + +2. Make sure to run `dbt run --select edr.dbt_artifacts` upon merging PRs. + + + +#### Result models + +The size of the queries depends on the amount of models/tests executed in the run. +The time the run results adds to the invocation shouldn't be significant. + +## Can I disable the `on-run-end` hooks? + +Yes, but note that this may cause missing results and/or outdated metadata in Elementary report and alerts. + +#### Disable metadata models updates + +Configure the following var: + +```yaml dbt_project.yml +vars: + disable_dbt_artifacts_autoupload: true +``` + +If you only want to disable dbt_columns but leave all other artifacts, configure the following: + +```yaml dbt_project.yml +vars: + disable_dbt_columns_autoupload: true +``` + + + If you disable the artifacts autoupload, we recommend your run `dbt run + --select elementary.edr.dbt_artifacts` every time you deploy changes to your + project. + + +#### Disable result models updates + +Configure the following vars (you can also disable with conditions): + +```yaml dbt_project.yml +vars: + disable_run_results: true + disable_tests_results: true + disable_dbt_invocation_autoupload: "{{ target.name != 'prod' }}" +``` + + Please note you can also use the vars in your command line rather than in the dbt_project.yml file, by adding `--vars '{"disable_dbt_artifacts_autoupload": true}'` to your command line. diff --git a/docs/data-tests/dbt/package-models.mdx b/docs/data-tests/dbt/package-models.mdx new file mode 100644 index 000000000..65f204688 --- /dev/null +++ b/docs/data-tests/dbt/package-models.mdx @@ -0,0 +1,521 @@ +--- +title: "Elementary dbt package models" +sidebarTitle: "Package Models" +--- + +## Run Results Tables + +These tables track execution details, test outcomes, and performance metrics from your dbt runs. + +### dbt_run_results + +_Incremental model_ + +Run results of dbt invocations, inserted at the end of each invocation. +Each row is the invocation result of a single resource (model, test, snapshot, etc). +New data is loaded to this model on an on-run-end hook named `elementary.upload_run_results` from each invocation that produces a result object. + + +- `model_execution_id` (string) - Execution id generated by joining the unique_id of the resource and the invocation_id. This is the unique key of each row. +- `unique_id` (string) - The unique id of the resource (would be similar for all executions of the same resource). +- `invocation_id` (string) - The unique id of the invocation (would be similar for all resources executed on the same invocation). Foreign key to `dbt_invocations`. +- `name` (string) - Resource name. +- `status` (string) - Execution result status (success, error, pass, fail). +- `resource_type` (string) - Resource type (model, test, snapshot, seed, etc). +- `execution_time` (float) - Resource execution duration in seconds. +- `execute_started_at` (string) - Start time of the execution. +- `execute_completed_at` (string) - End time of the execution. +- `compile_started_at` (string) - Start time of resource compile action. +- `compile_completed_at` (string) - End time of resource compile action. +- `rows_affected` (int) - Number of rows affected by the execution. +- `full_refresh` (boolean) - Whether this was a full refresh execution. +- `compiled_code` (string) - The compiled code (SQL / Python) executed against the database. +- `failures` (int) - Number of failures in this run. +- `query_id` (string) - Query ID in the data warehouse, if returned by the adapter (currently only supported in Snowflake, is null for any other adapter). +- `thread_id` (string) - Id of the thread of this resource run. +- `adapter_response` (string) - Response returned by the adapter (Fields will be different for each adapter). +- `message` (string) - Execution results message returned by dbt. +- `generated_at` (string) - Timestamp when the result was generated. + + +### dbt_invocations + +_Incremental model_ + +Attributes associated with each dbt invocation, inserted at the end of each invocation. +Each row is the result of a single invocation (dbt run, dbt test, dbt build, etc). +New data is loaded to this model on an on-run-end hook named `elementary.upload_dbt_invocation`. +It also contains information about your job or what triggered the invocation such as `pull_request_id`, `git_sha`, or `cause`. +If you're using an orchestrator that Elementary natively supports such as dbt Cloud or GitHub Actions, +this data is automatically populated, otherwise, you can populate it by using environment variables in the form of `DBT_`. +For instance, adding `DBT_JOB_NAME` will populate `dbt_invocations.job_name` with the value of the environment variable. + + +- `invocation_id` (string) - Primary key of this table. +- `run_started_at` (string) - Timestamp the invocation was started. +- `run_completed_at` (string) - Timestamp the invocation was completed. +- `generated_at` (string) - The time this invocation was uploaded to the database. +- `command` (string) - dbt command that was used (e.g., run, test, build). +- `dbt_version` (string) - Version of dbt that was used in this invocation. +- `elementary_version` (string) - Version of the elementary package that was used in this invocation. +- `full_refresh` (boolean) - Whether or not this invocation was executed as a full-refresh. +- `target_name` (string) - Name of the target used in this invocation. +- `target_database` (string) - Name of the target database that was used in this invocation. +- `target_schema` (string) - Name of the target schema that was used in this invocation. +- `target_profile_name` (string) - Name of the dbt profile that was used in this invocation. +- `threads` (integer) - Number of threads that were used to run this dbt invocation. +- `selected` (string) - The selected resources in the dbt command. While this is a string in the database, this can easily be converted to an array. +- `yaml_selector` (string) - The yaml selector that was passed in this invocation. +- `job_id` (string) - The ID of a job, defined in the `job_id` var or in the `JOB_ID` env var or by the orchestrator. +- `job_name` (string) - The name of a job, defined in the `job_name` var or in the `JOB_NAME` env var. +- `job_run_id` (string) - The run ID of a job, defined in the `job_run_id` var or in the `DBT_JOB_RUN_ID` env var or by the orchestrator. +- `env` (string) - The environment's name, defined in the `DBT_ENV` env var. +- `env_id` (string) - The ID of an environment, defined in the `DBT_ENV_ID` env var. +- `project_id` (string) - The ID of a project, defined in the `DBT_PROJECT_ID` env var or by the orchestrator. +- `cause_category` (string) - The category of the cause of the invocation (e.g., schedule, manual). +- `cause` (string) - The cause of the invocation (e.g., "Kicked off by Joe"). +- `pull_request_id` (string) - The ID of a pull request, defined in the `DBT_PULL_REQUEST_ID` env var or by the orchestrator. +- `git_sha` (string) - The git SHA of the commit that was used in this invocation. +- `orchestrator` (string) - The orchestrator that was used to run this invocation (e.g., dbt Cloud, GitHub Actions). +- `job_url` (string) - The URL of the job, defined in the `job_url` var or in the `JOB_URL` env var or by the orchestrator. +- `account_id` (string) - The ID of the account, defined in the `account_id` var or in the `ACCOUNT_ID` env var or by the orchestrator. +- `invocation_vars` (string) - Dictionary of the variables (and values) that were declared in the invocation. +- `vars` (string) - Dictionary of all variables (and values) in the dbt project. + + +### model_run_results + +_View_ + +Run results of dbt models, enriched with models metadata. +Each row is the result of a single model. +This is a view that joins data from `dbt_run_results` and `dbt_models`. + +**Columns:** Combines all columns from `dbt_run_results` with metadata from `dbt_models` such as `database_name`, `schema_name`, `tags`, `owner`, `materialization`, `package_name`, `path`, `original_path`, and `alias`. + +### snapshot_run_results + +_View_ + +Run results of dbt snapshots, enriched with snapshots metadata. +Each row is the result of a single snapshot. +This is a view that joins data from `dbt_run_results` and `dbt_snapshots`. + +**Columns:** Combines all columns from `dbt_run_results` with metadata from `dbt_snapshots` such as `database_name`, `schema_name`, `tags`, `owner`, `materialization`, `package_name`, `path`, `original_path`, and `alias`. + +### seed_run_results + +_View_ + +Run results of dbt seeds, enriched with seeds metadata. +Each row is the result of a single seed. +This is a view that joins data from `dbt_run_results` and `dbt_seeds`. + +**Columns:** Combines all columns from `dbt_run_results` with metadata from `dbt_seeds` such as `database_name`, `schema_name`, `tags`, `owner`, `package_name`, `path`, `original_path`, and `alias`. + +### job_run_results + +_View_ + +Run results of dbt invocations, enriched with jobs metadata. +Each row is the result of a single job. +This is a view on `dbt_invocations`. + +**Columns:** All columns from `dbt_invocations` table. + +### elementary_test_results + +_Incremental model_ + +Run results of all dbt tests, with fields and metadata needed to produce +the [Elementary report](cloud/features/collaboration-and-communication/data-observability-dashboard). +Each row is the result of a single test, including native dbt tests, packages tests and elementary tests. +New data is loaded to this model on an on-run-end hook named `elementary.handle_tests_results`. + + +- `id` (string) - Unique identifier for the test result. +- `test_unique_id` (string) - The unique id of the test. +- `invocation_id` (string) - The unique id of the invocation. Foreign key to `dbt_invocations`. +- `detected_at` (timestamp) - When the test result was detected. +- `created_at` (timestamp) - When the test result record was created. +- `status` (string) - Test result status (pass, fail, error, warn). +- `result_rows` (int) - Number of rows that failed the test (for failing tests). +- `failures` (int) - Number of failures. +- Additional columns include test metadata, execution details, and test-specific information. + + +### test_result_rows + +_Incremental model_ + +Failed test row samples. Each row contains a sample of data that caused a test to fail, including the test result ID that links to the parent test result, the actual sample data stored as JSON, and timestamps for detection and creation. By default, up to 5 sample rows are stored per failed test (configurable via [test_sample_row_count](https://docs.elementary-data.com/oss/general/faq#can-i-see-more-result-samples-in-the-report)). + + +- `id` (string) - Unique identifier for the test result row. +- `test_result_id` (string) - Links to the parent test result in `elementary_test_results`. +- `result_row` (string/JSON) - Sample data that caused the test to fail, stored as JSON. +- `detected_at` (timestamp) - When the failing row was detected. +- `created_at` (timestamp) - When the record was created. + + +### dbt_source_freshness_results + +_Incremental model_ + +Results from dbt source freshness checks. Tracks when source data was last updated and whether it meets freshness thresholds. + + +- `source_freshness_execution_id` (string) - Unique identifier for the freshness check execution. +- `unique_id` (string) - The unique id of the source. +- `max_loaded_at` (string) - The maximum loaded_at timestamp found in the source data. +- `snapshotted_at` (string) - When the freshness check was performed. +- `generated_at` (string) - When the result was generated. +- `created_at` (timestamp) - When the record was created. +- `max_loaded_at_time_ago_in_s` (float) - How many seconds ago the max_loaded_at timestamp was. +- `status` (string) - Freshness check status (pass, warn, error). +- `error` (string) - Error message if the check failed. +- `compile_started_at` (string) - Start time of compilation. +- `compile_completed_at` (string) - End time of compilation. +- `execute_started_at` (string) - Start time of execution. +- `execute_completed_at` (string) - End time of execution. +- `invocation_id` (string) - The unique id of the invocation. Foreign key to `dbt_invocations`. +- `warn_after` (string) - Freshness warning threshold. +- `error_after` (string) - Freshness error threshold. +- `filter` (string) - Filter expression used in the freshness check. + + + +## Metadata Tables - dbt Artifacts + +These tables provide a comprehensive view of your dbt project structure and configurations. + +The dbt artifacts models are created as empty tables, and a post-hook macro inserts data from the dbt graph object to +the table. +**Each time the model is executed, the data is replaced with the project's current graph.** +It is recommended to execute these models every time a change is merged to the project. + +### dbt_models + +_Table_ + +Metadata about all the models in the project and project packages. +Each row contains information about the properties of a single model, including columns like tags, owner, +materialization, depends_on, and description. + + +- `unique_id` (string) - The unique id of the model. +- `name` (string) - Model name. +- `alias` (string) - Model alias. +- `database_name` (string) - The model database name. +- `schema_name` (string) - The model schema name. +- `materialization` (string) - The model materialization config (e.g., table, view, incremental). +- `tags` (string) - Model tags property (stored as JSON array string). +- `meta` (string) - The content of 'meta' property key (stored as JSON string). +- `owner` (string) - Model owner property (configured under 'meta' key). +- `description` (string) - Model description. +- `package_name` (string) - Package name of the model. +- `path` (string) - Short path of the model file. +- `original_path` (string) - Full path of the model file. +- `checksum` (string) - Model file checksum. +- `depends_on_macros` (string) - The macros the model directly depends on (stored as JSON array string). +- `depends_on_nodes` (string) - The nodes the model directly depends on (stored as JSON array string). +- `generated_at` (string) - Update time of the table. + + +### dbt_tests + +_Table_ + +Metadata about all the tests in the project and project packages. +Each row contains information about the properties of a single test, including columns like severity, parent model +unique id, tags and owner of the parent model, test params, and the test compiled query. + + +- `unique_id` (string) - The unique id of the test. +- `name` (string) - The test name. +- `short_name` (string) - Short name of the test. +- `alias` (string) - Test alias. +- `type` (string) - Test type (e.g., singular, generic). +- `test_namespace` (string) - Namespace of the test (e.g., elementary, dbt_utils). +- `database_name` (string) - The tested model database name. +- `schema_name` (string) - The tested model schema name. +- `test_column_name` (string) - The name of the tested column (null for table-level tests). +- `parent_model_unique_id` (string) - The unique id of the model this test is attached to. +- `severity` (string) - Test severity (error, warn). +- `warn_if` (string) - Warning condition. +- `error_if` (string) - Error condition. +- `test_params` (string) - Test parameters (stored as JSON string). +- `tags` (string) - Test tags (stored as JSON array string). +- `model_tags` (string) - Tags from the parent model (stored as JSON array string). +- `model_owners` (string) - Owners from the parent model (stored as JSON array string). +- `meta` (string) - Test metadata (stored as JSON string). +- `description` (string) - Test description. +- `package_name` (string) - Package name of the test. +- `path` (string) - Short path of the test file. +- `original_path` (string) - Full path of the test file. +- `depends_on_macros` (string) - The macros the test directly depends on (stored as JSON array string). +- `depends_on_nodes` (string) - The nodes the test directly depends on (stored as JSON array string). +- `generated_at` (string) - Update time of the table. + + +### dbt_sources + +_Table_ + +Metadata about the sources configured in the project and project packages. +Each row contains information about the properties of a single source, including columns like tags, owner, freshness +configuration, database and schema. + + +- `unique_id` (string) - The unique id of the source. +- `source_name` (string) - The name of the source. +- `name` (string) - The name of the source table. +- `identifier` (string) - The identifier of the source table. +- `database_name` (string) - The source database name. +- `schema_name` (string) - The source schema name. +- `relation_name` (string) - Full relation name (database.schema.table). +- `loaded_at_field` (string) - Field used for freshness checks. +- `freshness_warn_after` (string) - Freshness warning threshold. +- `freshness_error_after` (string) - Freshness error threshold. +- `freshness_filter` (string) - Filter expression for freshness checks. +- `tags` (string) - Source tags (stored as JSON array string). +- `meta` (string) - Source metadata (stored as JSON string). +- `owner` (string) - Source owner (configured under 'meta' key). +- `description` (string) - Source description. +- `source_description` (string) - Description of the source itself. +- `package_name` (string) - Package name of the source. +- `path` (string) - Short path of the source file. +- `original_path` (string) - Full path of the source file. +- `generated_at` (string) - Update time of the table. + + +### dbt_exposures + +_Table_ + +Metadata about the exposures configured in the project and project packages. +Each row contains information about the properties of a single exposure, including columns like tags, owner, url and +depends on. + + +- `unique_id` (string) - The unique id of the exposure. +- `name` (string) - Exposure name. +- `type` (string) - Exposure type (e.g., dashboard, notebook, application). +- `maturity` (string) - Exposure maturity level. +- `url` (string) - URL of the exposure. +- `owner_email` (string) - Email of the exposure owner. +- `owner_name` (string) - Name of the exposure owner. +- `description` (string) - Exposure description. +- `tags` (string) - Exposure tags (stored as JSON array string). +- `meta` (string) - Exposure metadata (stored as JSON string). +- `package_name` (string) - Package name of the exposure. +- `path` (string) - Short path of the exposure file. +- `original_path` (string) - Full path of the exposure file. +- `depends_on_macros` (string) - The macros the exposure directly depends on (stored as JSON array string). +- `depends_on_nodes` (string) - The nodes the exposure directly depends on (stored as JSON array string). +- `generated_at` (string) - Update time of the table. + + +### dbt_metrics + +_Table_ + +Metadata about the metrics configured in the project and project packages. +Each row contains information about the properties of a single metric, including columns like tags, owner, sql, and +depends on. + + +- `unique_id` (string) - The unique id of the metric. +- `name` (string) - Metric name. +- `label` (string) - Metric label. +- `model` (string) - The model this metric is based on. +- `type` (string) - Metric type (e.g., simple, derived). +- `sql` (string) - SQL expression for the metric. +- `timestamp` (string) - Timestamp column for the metric. +- `filters` (string) - Metric filters (stored as JSON string). +- `time_grains` (string) - Time grains for the metric (stored as JSON array string). +- `dimensions` (string) - Metric dimensions (stored as JSON array string). +- `description` (string) - Metric description. +- `tags` (string) - Metric tags (stored as JSON array string). +- `meta` (string) - Metric metadata (stored as JSON string). +- `package_name` (string) - Package name of the metric. +- `path` (string) - Short path of the metric file. +- `original_path` (string) - Full path of the metric file. +- `depends_on_macros` (string) - The macros the metric directly depends on (stored as JSON array string). +- `depends_on_nodes` (string) - The nodes the metric directly depends on (stored as JSON array string). +- `generated_at` (string) - Update time of the table. + + +### dbt_snapshots + +_Table_ + +Metadata about all the snapshots in the project and project packages. +Each row contains information about the properties of a single snapshot, including columns like tags, owner, depends_on, +and description. + + +- `unique_id` (string) - The unique id of the snapshot. +- `name` (string) - Snapshot name. +- `alias` (string) - Snapshot alias. +- `database_name` (string) - The snapshot database name. +- `schema_name` (string) - The snapshot schema name. +- `materialization` (string) - The snapshot materialization config. +- `tags` (string) - Snapshot tags (stored as JSON array string). +- `meta` (string) - Snapshot metadata (stored as JSON string). +- `owner` (string) - Snapshot owner (configured under 'meta' key). +- `description` (string) - Snapshot description. +- `package_name` (string) - Package name of the snapshot. +- `path` (string) - Short path of the snapshot file. +- `original_path` (string) - Full path of the snapshot file. +- `checksum` (string) - Snapshot file checksum. +- `depends_on_macros` (string) - The macros the snapshot directly depends on (stored as JSON array string). +- `depends_on_nodes` (string) - The nodes the snapshot directly depends on (stored as JSON array string). +- `generated_at` (string) - Update time of the table. + + +### dbt_seeds + +_Table_ + +Metadata about seed files in the dbt project and project packages. +Each row contains information about the properties of a single seed, including columns like tags, owner, database, schema, and description. + + +- `unique_id` (string) - The unique id of the seed. +- `name` (string) - Seed name. +- `alias` (string) - Seed alias. +- `database_name` (string) - The seed database name. +- `schema_name` (string) - The seed schema name. +- `tags` (string) - Seed tags (stored as JSON array string). +- `meta` (string) - Seed metadata (stored as JSON string). +- `owner` (string) - Seed owner (configured under 'meta' key). +- `description` (string) - Seed description. +- `package_name` (string) - Package name of the seed. +- `path` (string) - Short path of the seed file. +- `original_path` (string) - Full path of the seed file. +- `checksum` (string) - Seed file checksum. +- `group_name` (string) - Group name if the seed belongs to a group. +- `metadata_hash` (string) - Hash of the metadata for change detection. +- `generated_at` (string) - Update time of the table. + + +### dbt_columns + +_Table_ + +Stores detailed information about columns across the dbt project. +Each row contains information about a single column from a model, source, or snapshot. + + +- `unique_id` (string) - The unique id of the column (format: `column.{parent_unique_id}.{column_name}`). +- `parent_unique_id` (string) - The unique id of the parent table (model, source, or snapshot). +- `name` (string) - Column name. +- `data_type` (string) - Column data type. +- `database_name` (string) - The database name of the parent table. +- `schema_name` (string) - The schema name of the parent table. +- `table_name` (string) - The table name (alias) of the parent table. +- `resource_type` (string) - Type of the parent resource (model, source, snapshot). +- `description` (string) - Column description. +- `tags` (string) - Column tags (stored as JSON array string). +- `meta` (string) - Column metadata (stored as JSON string). +- `metadata_hash` (string) - Hash of the metadata for change detection. +- `generated_at` (string) - Update time of the table. + + +### dbt_groups + +_Table_ + +Metadata about the groups configured in the project and project packages. +Each row contains information about the properties of a single group, including columns like group name and owner. + + +- `unique_id` (string) - The unique id of the group. +- `name` (string) - Group name. +- `owner` (string) - Group owner. +- `package_name` (string) - Package name of the group. +- `path` (string) - Short path of the group file. +- `original_path` (string) - Full path of the group file. +- `generated_at` (string) - Update time of the table. + + +## Alerts views + +### alerts_dbt_models + +_View_ + +A view that is used by the Elementary CLI to generate models alerts, including all the fields the alert will include +such as owner, tags, error message, etc. +It joins data about models and snapshots run results, and filters alerts according to configuration. + +### alerts_dbt_tests + +_View_ + +A view that is used by the Elementary CLI to generate dbt tests alerts, including all the fields the alert will include +such as owner, tags, error message, etc. +This view includes data about all dbt tests except elementary tests. +It filters alerts according to configuration. + +### alerts_anomaly_detection + +_View_ + +A view that is used by the Elementary CLI to generate alerts on data anomalies detected using the elementary anomaly +detection tests. +The view filters alerts according to configuration. + +### alerts_schema_changes + +_View_ + +A view that is used by the Elementary CLI to generate alerts on schema changes detected using elementary tests. +The view filters alerts according to configuration. + +## Anomaly detection + +### data_monitoring_metrics + +_Incremental model_ + +Elementary anomaly detection tests monitor metrics such as volume, freshness and data quality metrics. +This incremental table is used to store the metrics over time. +On each anomaly detection test, the test queries this table for historical metrics, and compares to the latest values. +The table is updated with new metrics on the on-run-end named `handle_test_results` that is executed at the end of dbt +test invocations. + +### metrics_anomaly_score + +_View_ + +This is a view on `data_monitoring_metrics` that runs the same query the anomaly detection tests run to calculate +anomaly scores. +The purpose of this view is to provide visibility to the results of anomaly detection tests. + +### anomaly_threshold_sensitivity + +_View_ + +This is a view on `metrics_anomaly_score` that calculates if values of metrics from the latest runs would have been +considered anomalies in different anomaly scores. +This can help you decide if there is a need to adjust the `anomaly_score_threshold`. + +### monitors_runs + +_View_ + +This is a view on `data_monitoring_metrics` that is used to determine when a specific anomaly detection test was last +executed. +Each anomaly detection test queries this view to decide on a start time for collecting metrics. + +## Schema changes + +### schema_columns_snapshot + +_Incremental model_ + +Stores the schema details for tables that are monitored with elementary schema changes test. +In order to compare current schema to previous state, we must store the previous state. +The data is from a view that queries the data warehouse information schema. diff --git a/docs/data-tests/dbt/quickstart-package.mdx b/docs/data-tests/dbt/quickstart-package.mdx new file mode 100644 index 000000000..c6c973c6a --- /dev/null +++ b/docs/data-tests/dbt/quickstart-package.mdx @@ -0,0 +1,14 @@ +--- +title: "Quickstart: Install Elementary dbt package" +sidebarTitle: "Install dbt package" +--- + +import QuickstartPackageInstall from '/snippets/quickstart-package-install.mdx'; + + + + + +## What's next? + +Take a moment to ⭐️ [star our Github repo!](https://github.com/elementary-data/elementary) ⭐️ (It helps us a lot!) diff --git a/docs/data-tests/dbt/reduce-on-run-end-time.mdx b/docs/data-tests/dbt/reduce-on-run-end-time.mdx new file mode 100644 index 000000000..a65813bda --- /dev/null +++ b/docs/data-tests/dbt/reduce-on-run-end-time.mdx @@ -0,0 +1,13 @@ +--- +title: "Control On-Run-End Hooks" +sidebarTitle: "Control on-run-end time" +--- + +import ReduceOnRunEndTime from '/snippets/guides/reduce-on-run-end-time.mdx'; + + + +For more information about configuring Elementary for different environments: +- **Elementary Cloud users**: See the [Dev/Prod Configuration guide](/cloud/guides/dev-prod-configuration) +- **Elementary OSS users**: See the [OSS configuration documentation](/oss/general/faq) + diff --git a/docs/data-tests/dbt/release-notes/releases/0.18.1.mdx b/docs/data-tests/dbt/release-notes/releases/0.18.1.mdx new file mode 100644 index 000000000..cd117b94f --- /dev/null +++ b/docs/data-tests/dbt/release-notes/releases/0.18.1.mdx @@ -0,0 +1,13 @@ +--- +title: "dbt Package 0.18.1" +sidebarTitle: "0.18.1" +--- + +This update includes: + +- 📝 Updated README +- 🐛 Report the same test status as dbt +- 🔧 Added resource type option to generate_schema_baseline_test + +Check out the full details here: https://github.com/elementary-data/dbt-data-reliability/releases/tag/0.18.1 + diff --git a/docs/data-tests/dbt/release-notes/releases/0.18.2.mdx b/docs/data-tests/dbt/release-notes/releases/0.18.2.mdx new file mode 100644 index 000000000..b62a51317 --- /dev/null +++ b/docs/data-tests/dbt/release-notes/releases/0.18.2.mdx @@ -0,0 +1,13 @@ +--- +title: "dbt Package 0.18.2" +sidebarTitle: "0.18.2" +--- + +This update includes: + +- 🔧 Allow contributor PRs to run tests +- 🗄️ Temp tables macro improvements +- 🗄️ Support for Athena in integration tests + +Check out the full details here: https://github.com/elementary-data/dbt-data-reliability/releases/tag/0.18.2 + diff --git a/docs/data-tests/dbt/release-notes/releases/0.18.3.mdx b/docs/data-tests/dbt/release-notes/releases/0.18.3.mdx new file mode 100644 index 000000000..43a42c8f2 --- /dev/null +++ b/docs/data-tests/dbt/release-notes/releases/0.18.3.mdx @@ -0,0 +1,12 @@ +--- +title: "dbt Package 0.18.3" +sidebarTitle: "0.18.3" +--- + +This update includes: + +- 🗄️ ClickHouse integration - artifacts only +- 🐛 Fixed Databricks temp relations + +Check out the full details here: https://github.com/elementary-data/dbt-data-reliability/releases/tag/0.18.3 + diff --git a/docs/data-tests/dbt/release-notes/releases/0.19.0.mdx b/docs/data-tests/dbt/release-notes/releases/0.19.0.mdx new file mode 100644 index 000000000..51bb79cb6 --- /dev/null +++ b/docs/data-tests/dbt/release-notes/releases/0.19.0.mdx @@ -0,0 +1,12 @@ +--- +title: "dbt Package 0.19.0" +sidebarTitle: "0.19.0" +--- + +This update includes: + +- 📊 Added dbt group artifact and column support +- ⏰ Added seasonality parameter to test_event_freshness_anomalies + +Check out the full details here: https://github.com/elementary-data/dbt-data-reliability/releases/tag/0.19.0 + diff --git a/docs/data-tests/dbt/release-notes/releases/0.19.1.mdx b/docs/data-tests/dbt/release-notes/releases/0.19.1.mdx new file mode 100644 index 000000000..82ac6e835 --- /dev/null +++ b/docs/data-tests/dbt/release-notes/releases/0.19.1.mdx @@ -0,0 +1,12 @@ +--- +title: "dbt Package 0.19.1" +sidebarTitle: "0.19.1" +--- + +This update includes: + +- ☁️ BigQuery - save execution_project in adapter-specific fields +- 🐛 Fixed dbt metric collection + +Check out the full details here: https://github.com/elementary-data/dbt-data-reliability/releases/tag/0.19.1 + diff --git a/docs/data-tests/dbt/release-notes/releases/0.19.2.mdx b/docs/data-tests/dbt/release-notes/releases/0.19.2.mdx new file mode 100644 index 000000000..190861002 --- /dev/null +++ b/docs/data-tests/dbt/release-notes/releases/0.19.2.mdx @@ -0,0 +1,18 @@ +--- +title: "dbt Package 0.19.2" +sidebarTitle: "0.19.2" +--- + +This update includes: + +- 📊 Updated tests quality dimension +- 🔧 Added metadata comment to all queries run with elementary.run.query +- 🔐 Snowflake user creation - use public key +- 📝 Added runtime log to dbt package +- 🐛 Fixed schema name generation for branch names with forward slashes +- 🔒 Added configuration to disable sample collection for PII tables +- 🛡️ Implemented column-level PII protection for sample collection +- 🗄️ Dremio improvements + +Check out the full details here: https://github.com/elementary-data/dbt-data-reliability/releases/tag/0.19.2 + diff --git a/docs/data-tests/dbt/release-notes/releases/0.19.3.mdx b/docs/data-tests/dbt/release-notes/releases/0.19.3.mdx new file mode 100644 index 000000000..ffa2720d3 --- /dev/null +++ b/docs/data-tests/dbt/release-notes/releases/0.19.3.mdx @@ -0,0 +1,12 @@ +--- +title: "dbt Package 0.19.3" +sidebarTitle: "0.19.3" +--- + +This update includes: + +- 🗄️ Dremio types mapping improvements +- 🔧 dbt-fusion fix: meta under config in dbt_run_results + +Check out the full details here: https://github.com/elementary-data/dbt-data-reliability/releases/tag/0.19.3 + diff --git a/docs/data-tests/dbt/release-notes/releases/0.19.4.mdx b/docs/data-tests/dbt/release-notes/releases/0.19.4.mdx new file mode 100644 index 000000000..9e9779c85 --- /dev/null +++ b/docs/data-tests/dbt/release-notes/releases/0.19.4.mdx @@ -0,0 +1,11 @@ +--- +title: "dbt Package 0.19.4" +sidebarTitle: "0.19.4" +--- + +This update includes: + +- 📊 Dimension anomalies visualization + +Check out the full details here: https://github.com/elementary-data/dbt-data-reliability/releases/tag/0.19.4 + diff --git a/docs/data-tests/dbt/release-notes/releases/0.20.0.mdx b/docs/data-tests/dbt/release-notes/releases/0.20.0.mdx new file mode 100644 index 000000000..40530dfa3 --- /dev/null +++ b/docs/data-tests/dbt/release-notes/releases/0.20.0.mdx @@ -0,0 +1,17 @@ +--- +title: "dbt Package 0.20.0" +sidebarTitle: "0.20.0" +--- + +This update includes: + +- 🗄️ dbt Fusion support +- ⚡ Performance improvements with indices for faster loading times +- 🐛 Fixed datetime casting macro for ClickHouse +- 🐛 Fixed Athena DROP TABLE parsing error with backtick quoting +- 🗄️ Dremio improvements: timestamp formatting, metadata permissions, and special character escaping +- 🐛 Fixed Databricks temp relations and Fusion bugfixes +- 📊 Index on test_result_rows.detected_at column for better performance + +Check out the full details here: https://github.com/elementary-data/dbt-data-reliability/releases/tag/0.20.0 + diff --git a/docs/data-tests/dbt/release-notes/releases/0.20.1.mdx b/docs/data-tests/dbt/release-notes/releases/0.20.1.mdx new file mode 100644 index 000000000..1310adf56 --- /dev/null +++ b/docs/data-tests/dbt/release-notes/releases/0.20.1.mdx @@ -0,0 +1,11 @@ +--- +title: "dbt Package 0.20.1" +sidebarTitle: "0.20.1" +--- + +This update includes: + +- 🔧 Fixed dbt Fusion version check to require major version 2 + +Check out the full details here: https://github.com/elementary-data/dbt-data-reliability/releases/tag/0.20.1 + diff --git a/docs/data-tests/dbt/singular-tests.mdx b/docs/data-tests/dbt/singular-tests.mdx new file mode 100644 index 000000000..719ff62e1 --- /dev/null +++ b/docs/data-tests/dbt/singular-tests.mdx @@ -0,0 +1,92 @@ +--- +title: "Singular Tests" +sidebarTitle: "Singular Tests" +--- + +Singular tests in dbt are custom SQL tests that allow you to write specific queries to validate your data. Unlike generic tests (like `not_null` or `unique`) that can be applied to multiple models, singular tests are one-off tests written as SQL queries in `.sql` files within your `tests/` directory. + +## What are Singular Tests? + +Singular tests are dbt tests defined as standalone SQL files. They are executed when you run `dbt test`, and they pass when the query returns zero rows, or fail when the query returns any rows. This makes them perfect for testing complex business logic, relationships between multiple tables, or custom validation rules that don't fit into standard generic tests. + +### How Singular Tests Work + +A singular test is simply a SQL query that should return no rows if the test passes. For example: + +```sql +-- tests/assert_no_null_orders.sql +select * +from {{ ref('orders') }} +where status is null +``` + +This test will fail if there are any orders with a null status, and pass if all orders have a status value. + +## The Multiple Tables Challenge + +In singular custom tests, the query can sometimes involve multiple tables. When this happens, there's no unique link between a test and a specific asset. The test may fail, but the incident isn't shown because Elementary doesn't know which asset the test is connected to. + +For example, consider this test that joins multiple tables: + +```sql +select * +from {{ ref('orders') }} +join {{ ref('payments') }} +where status is null +``` + +When this test fails, Elementary can't determine whether the issue is with the `orders` table or the `payments` table, making it difficult to surface the incident in the right context. + +## Solution: Override Primary Test Model ID + +To handle this scenario, Elementary provides an option to explicitly define which table is actually being tested using the `override_primary_test_model_id` configuration. This ensures that test failures are properly linked to the correct asset. + +### Configuration + +You can configure the primary model like this: + +```sql +{{ + config( + severity='error', + override_primary_test_model_id='model.jaffle_shop_online.orders' + ) +}} + +select * +from {{ ref('orders') }} +join {{ ref('payments') }} +where status is null +``` + +In this example, even though both `orders` and `payments` are in the query, Elementary will link the test to the `orders` table since it's explicitly set as the primary model. + +### Ownership Behavior + +When you set `override_primary_test_model_id`, Elementary will use the owner from the specified primary model. In the example above, even though both orders and payments are in the query, Elementary will use the owner from the orders table, since it's explicitly set as the primary model and there is no meta section for this test. + +This ensures that: +- Test incidents are properly attributed to the correct asset +- Alerts are routed to the right owners +- The test appears in the correct asset's lineage and catalog page + +## Creating Singular Tests in Elementary Cloud + +In Elementary Cloud, when creating tests from the UI, you can add singular tests simply through the visual interface. The UI guides you through the process of: + +1. Writing your SQL query +2. Configuring the primary model (if your query involves multiple tables) +3. Setting test metadata like name, description, severity, tags, and owners + +Singular test configuration in Elementary Cloud UI + +The UI automatically handles the conversion of your test into a proper dbt singular test file, including the `override_primary_test_model_id` configuration when needed. + +## Best Practices + +- **Use descriptive names** — Name your singular test files clearly to indicate what they're testing (e.g., `assert_orders_have_valid_payments.sql`) +- **Set the primary model** — Always use `override_primary_test_model_id` when your query involves multiple tables to ensure proper incident attribution +- **Add descriptions** — Include comments in your SQL explaining the business logic being tested +- **Configure severity** — Use `severity='error'` for critical tests and `severity='warn'` for non-critical validations +- **Link to assets** — When possible, configure the primary model to link the test to the most relevant asset for better visibility + diff --git a/docs/data-tests/dbt/upgrade-package.mdx b/docs/data-tests/dbt/upgrade-package.mdx new file mode 100644 index 000000000..df536b379 --- /dev/null +++ b/docs/data-tests/dbt/upgrade-package.mdx @@ -0,0 +1,33 @@ +--- +title: "Upgrade Elementary dbt package" +sidebarTitle: "Upgrade package" +--- + +On new releases, you will need to upgrade the Elementary dbt package. + +## Upgrade Elementary dbt package + +1. On your `packages.yml` file, change the version to the latest: + +```yml packages.yml +packages: + - package: elementary-data/elementary + version: 0.23.0 +``` + +2. Run the command: + +```shell +dbt deps +``` + +3. When there's a change in the structure of the Elementary tables, the minor version is raised. If you're updating a minor version (for example 0.22.X -> 0.23.X), run this command to rebuild the Elementary tables: + +```shell +dbt run --select elementary +``` + + +**Note:** For CLI upgrades, refer to the [Elementary OSS upgrade guide](/oss/release-notes/upgrading-elementary#upgrade-elementary-cli). + + diff --git a/docs/data-tests/execution-sla.mdx b/docs/data-tests/execution-sla.mdx new file mode 100644 index 000000000..185db05bb --- /dev/null +++ b/docs/data-tests/execution-sla.mdx @@ -0,0 +1,128 @@ +--- +title: "execution_sla" +sidebarTitle: "Execution SLA" +--- + +import AiGenerateTest from '/snippets/ai-generate-test.mdx'; + + + + +`elementary.execution_sla` + +Verifies that dbt models are executed successfully before a specified SLA deadline time. + +This test checks whether your pipeline _ran_ before a specified deadline on the days you care about. It queries `dbt_run_results` for successful runs of the model and validates that at least one run completed before the SLA deadline. + +Note that this test only verifies that the model executed, not that the data is actually fresh. If you need to verify that the underlying data was updated (e.g., an upstream source refreshed), use [`data_freshness_sla`](/data-tests/data-freshness-sla) instead. + +### Use Case + +"Did my pipeline complete before 7 AM Pacific on the days I care about?" + +### Test Logic + +1. If today is not a scheduled check day → **PASS** (skip) +2. Query `dbt_run_results` for successful runs of the model today +3. If any run completed before the SLA deadline → **PASS** +4. If SLA deadline hasn't passed yet → **PASS** (still time) +5. If model ran but after deadline → **FAIL** (MISSED_SLA) +6. If all runs failed → **FAIL** (ALL_FAILED) +7. If model didn't run today → **FAIL** (NOT_RUN) + +### Test configuration + +_Required configuration: `sla_time`, `timezone`_ + +{/* prettier-ignore */} +
+ 
+  data_tests:
+      -- elementary.execution_sla:
+          arguments:
+              sla_time: string # Required - e.g., "07:00", "7am", "2:30pm", "14:30"
+              timezone: string # Required - IANA timezone name, e.g., "America/Los_Angeles", "Europe/Amsterdam"
+              day_of_week: string | array # Optional - Day(s) to check: "Monday" or ["Monday", "Wednesday"]
+              day_of_month: int | array # Optional - Day(s) of month to check: 1 or [1, 15]
+ 
+
+ + + +```yml Models +models: + - name: < model name > + data_tests: + - elementary.execution_sla: + arguments: + sla_time: < deadline time > # Required - e.g., "07:00", "7am", "2:30pm" + timezone: < IANA timezone > # Required - e.g., "America/Los_Angeles" + day_of_week: < day or array > # Optional + day_of_month: < day or array > # Optional +``` + +```yml Daily check (default) +models: + - name: daily_revenue + data_tests: + - elementary.execution_sla: + arguments: + sla_time: "07:00" + timezone: "America/Los_Angeles" + config: + tags: ["elementary"] + severity: error +``` + +```yml Weekly - only Mondays and Wednesdays +models: + - name: weekly_report + data_tests: + - elementary.execution_sla: + arguments: + sla_time: "06:00" + timezone: "Europe/London" + day_of_week: ["Monday", "Wednesday"] + config: + tags: ["elementary"] +``` + +```yml Monthly - only 1st and 15th +models: + - name: monthly_close + data_tests: + - elementary.execution_sla: + arguments: + sla_time: "09:00" + timezone: "Asia/Tokyo" + day_of_month: [1, 15] + config: + tags: ["elementary"] +``` + + + +### Features + +- **Flexible time formats**: Supports `"07:00"`, `"7am"`, `"2:30pm"`, `"14:30"`, and other common formats +- **IANA timezone support**: Uses standard timezone names like `"America/Los_Angeles"`, `"Europe/Amsterdam"`, etc. +- **Automatic DST handling**: Uses `pytz` for timezone conversions with automatic daylight saving time handling +- **Database-agnostic**: All timezone logic happens at compile time +- **Schedule filters**: Optional `day_of_week` and `day_of_month` parameters to check only specific days + +### Parameters + +| Parameter | Required | Description | +| -------------- | -------- | ------------------------------------------------------ | +| `sla_time` | Yes | Deadline time (e.g., `"07:00"`, `"7am"`, `"2:30pm"`) | +| `timezone` | Yes | IANA timezone name (e.g., `"America/Los_Angeles"`) | +| `day_of_week` | No | Day(s) to check: `"Monday"` or `["Monday", "Wednesday"]` | +| `day_of_month` | No | Day(s) of month to check: `1` or `[1, 15]` | + +### Notes + +- This test only works with **models**, not sources +- The test automatically skips on non-scheduled days (when `day_of_week` or `day_of_month` filters are set) +- If both `day_of_week` and `day_of_month` are set, the test uses OR logic (checks if either matches) +- The test passes if the SLA deadline hasn't been reached yet, giving your pipeline time to complete + diff --git a/docs/data-tests/how-anomaly-detection-works.mdx b/docs/data-tests/how-anomaly-detection-works.mdx index f2c2663f1..79be9c7f1 100644 --- a/docs/data-tests/how-anomaly-detection-works.mdx +++ b/docs/data-tests/how-anomaly-detection-works.mdx @@ -3,6 +3,8 @@ title: "Elementary anomaly detection tests" sidebarTitle: "Elementary anomaly detection" --- + + Elementary data anomaly detection tests monitor a specific metric (like row count, null rate, average value, etc.) and compare recent values to historical values. This is done to detect [significant changes and deviations](/data-tests/data-anomaly-detection), that are probably data reliability issues. diff --git a/docs/data-tests/introduction.mdx b/docs/data-tests/introduction.mdx index 2dcdae050..fd7546849 100644 --- a/docs/data-tests/introduction.mdx +++ b/docs/data-tests/introduction.mdx @@ -3,24 +3,26 @@ title: "Elementary Data Tests" sidebarTitle: "Introduction" --- -The Elementary dbt package offers two test types: - -- **Pipeline tests:** Monitor the health of data pipelines, ensuring timely and smooth data ingestion, transformation, and loading. -- **Data quality tests:** Validate data accuracy, completeness, and correctness, detect anomalies and schema changes, and ensure the data meets predefined business rules. +import TestsCards from '/snippets/data-tests/tests-cards.mdx'; -Together, these tests ensure reliable pipelines and trusted data. - -In addition to the previously mentioned tests, the [Elementary Cloud Platform](https://docs.elementary-data.com/cloud/introduction) offers **automated pipeline tests.** While traditional tests query the dbt tables directly, automated pipeline tests analyze **query history metadata**. This method is both **faster and more cost-efficient**, as it eliminates the need to query large datasets, focusing solely on the metadata layer. -Elementary automatically creates monitors for every model and source in your dbt project once you set up your environment, no configuration is required. Learn more about [automated tests](https://docs.elementary-data.com/features/anomaly-detection/automated-monitors). -Elementary provides tests for detection of data quality issues. +Elementary provides anomaly tests for detection of data quality issues. Elementary data tests are configured and executed like native tests in your dbt project. Elementary tests can be used in addition to dbt tests, packages tests (such as dbt-expectations), and custom tests. All of these test results will be presented in the Elementary UI and alerts. - +The Elementary dbt package offers two test types: + +- **Pipeline tests:** Monitor the health of data pipelines, ensuring timely and smooth data ingestion, transformation, and loading. +- **Data quality tests:** Validate data accuracy, completeness, and correctness, detect anomalies and schema changes, and ensure the data meets predefined business rules. + +Together, these tests ensure reliable pipelines and trusted data. + +In addition to the mentioned dbt package tests, the [Elementary Cloud Platform](https://docs.elementary-data.com/cloud/introduction) offers **automated pipeline tests.** While traditional tests query the dbt tables directly, automated pipeline tests analyze **query history metadata**. This method is both **faster and more cost-efficient**, as it eliminates the need to query large datasets, focusing solely on the metadata layer. Learn more about [automated tests](https://docs.elementary-data.com/features/anomaly-detection/automated-monitors). + + ## Anomaly detection tests @@ -112,3 +114,10 @@ Tests to detect anomalies in data quality metrics such as volume, freshness, nul Write your own custom tests using Python scripts. + +## dbt tests outcomes + +In dbt, there are three possible outcomes when running models or tests: errors, failures, and warnings. +- An **error** means dbt could not run the SQL at all (e.g., syntax mistake, missing table, broken macro). This stops execution (on `dbt build`). +- A **failure** happens when a test runs successfully but its condition isn’t met, and if the test’s severity is set to error, it will fail the pipeline. +- A **warning** is the same as a failure in terms of data quality, but with severity: warn, dbt exits successfully and does not break pipelines. diff --git a/docs/data-tests/python-tests.mdx b/docs/data-tests/python-tests.mdx index dfb2e16ff..699cbcd34 100644 --- a/docs/data-tests/python-tests.mdx +++ b/docs/data-tests/python-tests.mdx @@ -40,9 +40,10 @@ A Python test is defined like any other dbt test. ```yaml models/schema.yml - name: orders - tests: + data_tests: - elementary.python: - code_macro: check_undelivered_orders + arguments: + code_macro: check_undelivered_orders ``` Then, we need to define a macro under `macros/.sql` that contains the Python code the test will execute. @@ -86,11 +87,12 @@ Let's compare two different tables, or views, within our warehouse. ```yaml models/schema.yml - name: orders - tests: + data_tests: - elementary.python: - code_macro: compare_tables - macro_args: - other_table: raw_orders + arguments: + code_macro: compare_tables + macro_args: + other_table: raw_orders ``` We're passing an additional argument to the test called `macro_args`. @@ -122,13 +124,14 @@ In this example we'll validate a JSON column according to a pre-defined schema. ```yaml models/schema.yml - name: login_events - tests: + data_tests: - elementary.python: - code_macro: validate_json - macro_args: - schema: "{'type': 'object', 'properties': {'country': {'type': 'string'}}}" - column: geolocation_json - packages: ["jsonschema"] + arguments: + code_macro: validate_json + macro_args: + schema: "{'type': 'object', 'properties': {'country': {'type': 'string'}}}" + column: geolocation_json + packages: ["jsonschema"] ``` Here we'll be testing that the `country` that is provided in the `geolocation_json` column in the `login_events` diff --git a/docs/data-tests/schema-tests/exposure-tests.mdx b/docs/data-tests/schema-tests/exposure-tests.mdx index 54a7800b8..13e49d24c 100644 --- a/docs/data-tests/schema-tests/exposure-tests.mdx +++ b/docs/data-tests/schema-tests/exposure-tests.mdx @@ -130,12 +130,15 @@ For each module schema you wish to verify the exposure dependencies, add the ele config: tags: ["finance"] - tests: + data_tests: - elementary.volume_anomalies: - tags: ["table_anomalies"] - timestamp_column: "order_date" + config: + tags: ["table_anomalies"] + arguments: + timestamp_column: "order_date" - elementary.exposure_schema_validity: - tags: [elementary] + config: + tags: [elementary] ``` diff --git a/docs/data-tests/schema-tests/json-schema.mdx b/docs/data-tests/schema-tests/json-schema.mdx index 3b55131b2..3a0c85c84 100644 --- a/docs/data-tests/schema-tests/json-schema.mdx +++ b/docs/data-tests/schema-tests/json-schema.mdx @@ -34,21 +34,22 @@ Please add the following test to your model configuration: columns: - name: raw_event_data - tests: + data_tests: - elementary.json_schema: - type: object - properties: - event_id: - type: integer - event_name: - type: string - event_args: - type: array - items: + arguments: + type: object + properties: + event_id: + type: integer + event_name: type: string - required: - - event_id - - event_name + event_args: + type: array + items: + type: string + required: + - event_id + - event_name ``` _Note: The `generate_json_schema_test` macro relies on a 3rd-party python library called `genson`. If you are using @@ -64,8 +65,10 @@ models: - name: < model name > columns: - name: < column name > - tests: - - elementary.json_schema: + data_tests: + - elementary.json_schema: + arguments: + ``` ```yml Models example @@ -75,21 +78,22 @@ models: - name: login_events columns: - name: raw_event_data - tests: + data_tests: - elementary.json_schema: - type: object - properties: - event_id: - type: integer - event_name: - type: string - event_args: - type: array - items: + arguments: + type: object + properties: + event_id: + type: integer + event_name: type: string - required: - - event_id - - event_name + event_args: + type: array + items: + type: string + required: + - event_id + - event_name ``` diff --git a/docs/data-tests/schema-tests/schema-changes-from-baseline.mdx b/docs/data-tests/schema-tests/schema-changes-from-baseline.mdx index 5281e5d41..5092ebd57 100644 --- a/docs/data-tests/schema-tests/schema-changes-from-baseline.mdx +++ b/docs/data-tests/schema-tests/schema-changes-from-baseline.mdx @@ -61,7 +61,7 @@ dbt run-operation elementary.generate_schema_baseline_test --args '{"fail_on_add #> - name: #> columns: #> ... -#> tests: +#> data_tests: #> - elementary.schema_changes_from_baseline: #> ... ``` @@ -84,7 +84,7 @@ sources: data_type: < data type 1 > - name: < column 2 > data_type: < data type 2 > - tests: + data_tests: - elementary.schema_changes_from_baseline ``` @@ -102,9 +102,10 @@ sources: data_type: text - name: event_id data_type: integer - tests: + data_tests: - elementary.schema_changes_from_baseline - tags: ["elementary"] + config: + tags: ["elementary"] ``` ```yml Models @@ -117,7 +118,7 @@ models: data_type: < data type 1 > - name: < column 2 > data_type: < data type 1 > - tests: + data_tests: - elementary.schema_changes_from_baseline ``` @@ -131,9 +132,10 @@ models: data_type: text - name: event_id data_type: integer - tests: + data_tests: - elementary.schema_changes_from_baseline: - tags: ["elementary"] + config: + tags: ["elementary"] ``` diff --git a/docs/data-tests/schema-tests/schema-changes.mdx b/docs/data-tests/schema-tests/schema-changes.mdx index 8a0edec7b..4aff2e91e 100644 --- a/docs/data-tests/schema-tests/schema-changes.mdx +++ b/docs/data-tests/schema-tests/schema-changes.mdx @@ -19,7 +19,7 @@ version: 2 models: - name: < model name > - tests: + data_tests: - elementary.schema_changes ``` @@ -28,10 +28,10 @@ version: 2 models: - name: login_events - tests: + data_tests: - elementary.schema_changes: - tags: ["elementary"] config: + tags: ["elementary"] severity: warn ``` diff --git a/docs/data-tests/test-result-samples.mdx b/docs/data-tests/test-result-samples.mdx new file mode 100644 index 000000000..6dfae36c3 --- /dev/null +++ b/docs/data-tests/test-result-samples.mdx @@ -0,0 +1,254 @@ +--- +title: "Test Result Samples" +sidebarTitle: "Test Result Samples" +--- + +When a test fails, Elementary captures a sample of the failing rows and stores them in the `test_result_rows` table. These samples help you quickly understand and investigate data issues without manually running queries. + +By default, Elementary saves **5 sample rows per failed test**. + +This page describes all the available controls for managing test result samples -- both self-service configuration in your dbt project and options available through the Elementary team for Cloud users. + +## Configuring sample size + +### Global setting + +Set the number of sample rows saved per failed test across your entire project by adding the `test_sample_row_count` variable to your `dbt_project.yml`: + +```yaml +vars: + test_sample_row_count: 10 +``` + +Or pass it as a flag when running dbt: + +```shell +dbt test --vars '{"test_sample_row_count": 10}' +``` + +Set to `0` to disable sample collection entirely: + +```yaml +vars: + test_sample_row_count: 0 +``` + + + The larger the number of rows you save, the more data you will store in your data warehouse. This can affect the performance and cost of your Elementary schema, depending on your database. + + +### Per-test override + +You can override the global sample size for individual tests using the `test_sample_row_count` meta configuration: + +```yaml +models: + - name: orders + data_tests: + - unique: + config: + meta: + test_sample_row_count: 20 # Save more samples for this specific test + - not_null: + column_name: order_id + config: + meta: + test_sample_row_count: 0 # Disable samples for this test +``` + +The per-test setting takes precedence over the global variable. + +## Disabling samples for specific tests + +Use the `disable_test_samples` meta configuration to completely disable sample collection for a specific test: + +```yaml +models: + - name: user_profiles + data_tests: + - elementary.volume_anomalies: + config: + meta: + disable_test_samples: true +``` + +## PII protection + +Elementary provides built-in protection for sensitive data by automatically disabling test sample collection when a PII tag is detected. + +### Enable PII protection + +Add these variables to your `dbt_project.yml`: + +```yaml +vars: + disable_samples_on_pii_tags: true # Enable PII protection (default: false) + pii_tags: ['pii', 'sensitive'] # Tags that identify PII data (default: ['pii']) +``` + +PII tag matching is **case-insensitive** -- `PII`, `pii`, and `Pii` are all equivalent. + +### Tag levels + +PII tags are evaluated at three levels: + +**Model level** -- disables samples for all tests on that model: + +```yaml +models: + - name: customer_data + config: + tags: ['pii'] +``` + +**Column level** -- disables samples for tests that reference that column: + +```yaml +models: + - name: customer_data + columns: + - name: email + tags: ['pii'] +``` + +**Test level** -- disables samples for that specific test: + +```yaml +models: + - name: customer_data + data_tests: + - elementary.volume_anomalies: + config: + tags: ['pii'] +``` + +You can also tag entire directories in `dbt_project.yml`: + +```yaml +models: + my_project: + sensitive_data: + +tags: ['pii'] +``` + +## Showing samples selectively with `show_sample_rows` + +`show_sample_rows` is the inverse of PII protection. When enabled, **all samples are hidden by default** and only shown for models, columns, or tests that are explicitly tagged. + +### Enable selective sample display + +```yaml +vars: + enable_samples_on_show_sample_rows_tags: true # Hide all samples by default (default: false) +``` + +By default, the tag `show_sample_rows` is used to opt in. You can customize this with the `show_sample_rows_tags` variable -- just like `pii_tags` lets you customize which tags trigger PII protection: + +```yaml +vars: + enable_samples_on_show_sample_rows_tags: true + show_sample_rows_tags: ['show_sample_rows', 'debug'] # default: ['show_sample_rows'] +``` + +Tag matching is **case-insensitive**. + +### Tag levels + +Works at the same three levels as PII: + +**Model level** -- shows samples for all tests on that model: + +```yaml +models: + - name: orders + config: + tags: ['show_sample_rows'] +``` + +**Column level** -- shows samples for tests that target that column: + +```yaml +models: + - name: orders + columns: + - name: status + tags: ['show_sample_rows'] +``` + +**Test level** -- shows samples for that specific test: + +```yaml +models: + - name: orders + data_tests: + - elementary.volume_anomalies: + config: + tags: ['show_sample_rows'] +``` + +## Behavior matrix + +### `enable_samples_on_show_sample_rows_tags: true` + +| Level | Tags | Samples shown? | +| --- | --- | --- | +| model | none | No (hidden by default) | +| model | `show_sample_rows` | Yes | +| model | `pii` | No (hidden by default) | +| test | `show_sample_rows` | Yes | +| column | `show_sample_rows` | Yes | + +### `disable_samples_on_pii_tags: true` + +| Level | Tags | Samples shown? | +| --- | --- | --- | +| model | none | Yes | +| model | `pii` | No | +| test | `pii` | No | +| column | `pii` | No | +| column | `show_sample_rows` (no pii) | Yes (excluded from PII columns) | + +## Configuration precedence + +When multiple settings apply, Elementary follows this order (highest priority first): + +1. **`disable_test_samples` in test meta** -- per-test on/off switch +2. **`show_sample_rows` tag** -- when `enable_samples_on_show_sample_rows_tags: true` and the model, column, or test has a matching tag +3. **`enable_samples_on_show_sample_rows_tags`** -- hides all samples by default when enabled +4. **PII tag detection** -- when `disable_samples_on_pii_tags: true` and the model, column, or test has a matching tag +5. **`test_sample_row_count` global var** -- project-wide sample size +6. **Default** -- 5 rows + +## Elementary Cloud: additional controls + +For Elementary Cloud users, there are additional environment-level controls that can be enabled by the Elementary team. + + + The controls below are managed by Elementary and apply to how test samples are handled after they are synced from your data warehouse. To request changes, contact the Elementary team via Slack or email. + + +### Disable test samples for an environment + +The Elementary team can disable test samples entirely for a specific environment. When enabled: +- Test samples will **not be synced** from your Elementary schema. +- Test samples will **not appear** in the UI or in alerts, even if they exist in your warehouse. + +This is useful for environments that contain highly sensitive data where no sample rows should ever leave the warehouse. + +### Skip database storage of sample rows + +The Elementary team can configure an environment so that the `test_result_rows` data is stored only in the data lake (S3) and **not loaded into the application database**. This reduces database size while keeping the raw data available for debugging if needed. + +## Summary of all controls + +| Control | Scope | Where to configure | Default | +| --- | --- | --- | --- | +| `test_sample_row_count` | Global | `dbt_project.yml` vars | `5` | +| `test_sample_row_count` | Per-test | Test meta | Inherits global | +| `disable_test_samples` | Per-test | Test meta | `false` | +| `disable_samples_on_pii_tags` | Global | `dbt_project.yml` vars | `false` | +| `pii_tags` | Global | `dbt_project.yml` vars | `['pii']` | +| `enable_samples_on_show_sample_rows_tags` | Global | `dbt_project.yml` vars | `false` | +| `show_sample_rows_tags` | Global | `dbt_project.yml` vars | `['show_sample_rows']` | +| Disable samples for environment | Per-environment | Contact Elementary team | Disabled | +| Skip DB storage of sample rows | Per-environment | Contact Elementary team | Disabled | diff --git a/docs/data-tests/volume-threshold.mdx b/docs/data-tests/volume-threshold.mdx new file mode 100644 index 000000000..83ccb136f --- /dev/null +++ b/docs/data-tests/volume-threshold.mdx @@ -0,0 +1,168 @@ +--- +title: "volume_threshold" +sidebarTitle: "Volume Threshold" +--- + +import AiGenerateTest from '/snippets/ai-generate-test.mdx'; + + + + +`elementary.volume_threshold` + +Monitors row count changes between time buckets using configurable percentage thresholds with multiple severity levels. + +Unlike `volume_anomalies` (which uses z-score based anomaly detection as a dbt test, or ML-based detection in Elementary Cloud), this test lets you define explicit percentage thresholds for warnings and errors, giving you precise control over when to be alerted. It uses Elementary's metric caching infrastructure to avoid recalculating row counts for buckets that have already been computed. + +### Use Case + +"Alert me if my table's row count drops or spikes by more than 10% compared to the previous period." + +### Test Logic + +1. Collect row count metrics per time bucket (using Elementary's incremental metric caching) +2. Compare the most recent completed bucket against the previous bucket +3. Calculate the percentage change between the two +4. If the previous bucket has fewer rows than `min_row_count` → **PASS** (insufficient baseline) +5. If the absolute change exceeds `error_threshold_percent` → **ERROR** +6. If the absolute change exceeds `warn_threshold_percent` → **WARN** +7. Otherwise → **PASS** + +### Test configuration + +_Required configuration: `timestamp_column`_ + +{/* prettier-ignore */} +
+ 
+  data_tests:
+      -- elementary.volume_threshold:
+          arguments:
+              timestamp_column: column name # Required
+              warn_threshold_percent: int # Optional - default: 5
+              error_threshold_percent: int # Optional - default: 10
+              direction: [both | spike | drop] # Optional - default: both
+              time_bucket: # Optional
+                period: [hour | day | week | month]
+                count: int
+              where_expression: sql expression # Optional
+              days_back: int # Optional - default: 14
+              backfill_days: int # Optional - default: 2
+              min_row_count: int # Optional - default: 100
+ 
+
+ + + +```yml Models +models: + - name: < model name > + data_tests: + - elementary.volume_threshold: + arguments: + timestamp_column: < column name > # Required + warn_threshold_percent: < int > # Optional - default: 5 + error_threshold_percent: < int > # Optional - default: 10 + direction: < both | spike | drop > # Optional - default: both +``` + +```yml Default thresholds (5% warn, 10% error) +models: + - name: daily_orders + data_tests: + - elementary.volume_threshold: + arguments: + timestamp_column: created_at + config: + tags: ["elementary"] +``` + +```yml Custom thresholds - drop only +models: + - name: critical_transactions + data_tests: + - elementary.volume_threshold: + arguments: + timestamp_column: transaction_time + warn_threshold_percent: 3 + error_threshold_percent: 8 + direction: drop + config: + tags: ["elementary"] +``` + +```yml With time bucket and filter +models: + - name: hourly_events + data_tests: + - elementary.volume_threshold: + arguments: + timestamp_column: event_timestamp + warn_threshold_percent: 10 + error_threshold_percent: 25 + direction: both + time_bucket: + period: hour + count: 1 + where_expression: "event_type = 'purchase'" + config: + tags: ["elementary"] +``` + + + +### Features + +- **Dual severity levels**: Separate thresholds for warnings and errors, giving you graduated alerting +- **Directional monitoring**: Choose to monitor `both` directions, only `spike` (increases), or only `drop` (decreases) +- **Incremental metric caching**: Uses Elementary's `data_monitoring_metrics` table to avoid recalculating row counts for previously computed time buckets +- **Minimum baseline protection**: The `min_row_count` parameter prevents false alerts when the baseline is too small +- **Configurable time buckets**: Works with hourly, daily, weekly, or monthly buckets + +### Parameters + +| Parameter | Required | Default | Description | +| ------------------------- | -------- | ------- | ---------------------------------------------------------------------------- | +| `timestamp_column` | Yes | - | Column to determine time periods | +| `warn_threshold_percent` | No | 5 | Percentage change that triggers a warning | +| `error_threshold_percent` | No | 10 | Percentage change that triggers an error | +| `direction` | No | `both` | Direction to monitor: `both`, `spike`, or `drop` | +| `time_bucket` | No | `{period: day, count: 1}` | Time bucket configuration | +| `where_expression` | No | - | SQL expression to filter the data | +| `days_back` | No | 14 | Days of metric history to retain | +| `backfill_days` | No | 2 | Days to recalculate on each run | +| `min_row_count` | No | 100 | Minimum rows in the previous bucket required to trigger the check | + +### Comparison with volume_anomalies + +| Feature | `volume_threshold` | `volume_anomalies` | +| --- | --- | --- | +| Detection method | Fixed percentage thresholds | Z-score (dbt test) / ML (Cloud) | +| Severity levels | Dual (warn + error) | Single (pass/fail) | +| Best for | Known acceptable ranges | Unknown/variable patterns | +| Configuration | Explicit thresholds | Sensitivity tuning | +| Baseline | Previous bucket | Training period average | + +### How severity levels work + +This test has built-in dual severity using dbt's `warn_if` / `error_if` config. You do **not** need to set `config.severity` yourself. The behavior is: + +- Change exceeds `warn_threshold_percent` but not `error_threshold_percent` → **dbt warning** +- Change exceeds `error_threshold_percent` → **dbt error** (test fails) +- Change is below `warn_threshold_percent` → **pass** + +For example, with `warn_threshold_percent: 3` and `error_threshold_percent: 8`: +- A 2% drop → pass +- A 5% drop → warning +- A 10% drop → error + + +Do not set `config.severity: error` on this test. That would override the built-in dual severity and turn all warnings into errors, defeating the purpose of having separate thresholds. + + +### Notes + +- The `warn_threshold_percent` must be less than or equal to `error_threshold_percent` +- The test uses Elementary's metric caching infrastructure. Row counts for previously computed time buckets are reused across runs +- If the previous bucket has fewer rows than `min_row_count`, the test passes (insufficient data for a meaningful comparison) +- The test only evaluates completed time buckets diff --git a/docs/data-tests/with-context-tests.mdx b/docs/data-tests/with-context-tests.mdx new file mode 100644 index 000000000..947e52c17 --- /dev/null +++ b/docs/data-tests/with-context-tests.mdx @@ -0,0 +1,242 @@ +--- +title: "Tests with context" +sidebarTitle: "Tests with context" +--- + +A set of generic tests that extend common dbt, dbt-utils, and dbt-expectations tests with a `context_columns` parameter. +When a test fails, the failing rows are returned together with the columns you care about — making it much easier to investigate the root cause directly from the test results. + +If `context_columns` is omitted, **all columns** are returned alongside failing rows. + + + If a column listed in `context_columns` does not exist on the model, a warning + is logged and that column is skipped. The test continues and will not error. + + +--- + +## not_null_with_context + +`elementary.not_null_with_context` + +Validates that there are no null values in a column. Extends dbt's built-in `not_null` test. + +### Parameters + +| Parameter | Required | Default | Description | +| ----------------- | -------- | ------- | ------------------------------------------------------------------------------- | +| `column_name` | Yes | — | The column to test for null values. | +| `context_columns` | No | `none` | List of additional columns to return with failing rows. Omit to return all columns. | + + + +```yml With context columns +models: + - name: orders + columns: + - name: order_id + data_tests: + - elementary.not_null_with_context: + context_columns: [customer_id, order_date, amount] +``` + +```yml Return all columns +models: + - name: orders + columns: + - name: order_id + data_tests: + - elementary.not_null_with_context +``` + + + +--- + +## accepted_range_with_context + +`elementary.accepted_range_with_context` + +Validates that column values fall within an accepted range. Extends `dbt_utils.accepted_range`. + +### Parameters + +| Parameter | Required | Default | Description | +| ----------------- | -------- | ------- | ------------------------------------------------------------------------------------ | +| `column_name` | Yes | — | The column to test. | +| `min_value` | No* | `none` | Minimum accepted value (inclusive by default). At least one bound must be provided. | +| `max_value` | No* | `none` | Maximum accepted value (inclusive by default). At least one bound must be provided. | +| `inclusive` | No | `true` | Whether the bounds are inclusive. | +| `context_columns` | No | `none` | List of additional columns to return with failing rows. Omit to return all columns. | + +\* At least one of `min_value` or `max_value` must be provided. + + + +```yml With context columns +models: + - name: orders + columns: + - name: amount + data_tests: + - elementary.accepted_range_with_context: + min_value: 0 + max_value: 10000 + context_columns: [order_id, customer_id, order_date] +``` + +```yml Min bound only +models: + - name: orders + columns: + - name: amount + data_tests: + - elementary.accepted_range_with_context: + min_value: 0 + inclusive: false +``` + + + +--- + +## expect_column_values_to_not_be_null_with_context + +`elementary.expect_column_values_to_not_be_null_with_context` + +Expects column values to not be null. Extends `dbt_expectations.expect_column_values_to_not_be_null`. + +### Parameters + +| Parameter | Required | Default | Description | +| ----------------- | -------- | ------- | ------------------------------------------------------------------------------- | +| `column_name` | Yes | — | The column to test for null values. | +| `row_condition` | No | `none` | Optional SQL filter applied before testing (e.g. `"status = 'active'"`). | +| `context_columns` | No | `none` | List of additional columns to return with failing rows. Omit to return all columns. | + + + +```yml With row condition and context columns +models: + - name: subscriptions + columns: + - name: end_date + data_tests: + - elementary.expect_column_values_to_not_be_null_with_context: + row_condition: "status = 'active'" + context_columns: [subscription_id, customer_id, start_date] +``` + + + +--- + +## expect_column_values_to_be_unique_with_context + +`elementary.expect_column_values_to_be_unique_with_context` + +Expects column values to be unique. Returns all duplicate rows (not just a count), so you can see the full context of each duplicate. Extends `dbt_expectations.expect_column_values_to_be_unique`. + +### Parameters + +| Parameter | Required | Default | Description | +| ----------------- | -------- | ------- | ------------------------------------------------------------------------------- | +| `column_name` | Yes | — | The column to test for uniqueness. | +| `row_condition` | No | `none` | Optional SQL filter applied before testing. | +| `context_columns` | No | `none` | List of additional columns to return with failing rows. Omit to return all columns. | + + + +```yml With context columns +models: + - name: customers + columns: + - name: email + data_tests: + - elementary.expect_column_values_to_be_unique_with_context: + context_columns: [customer_id, created_at, name] +``` + + + +--- + +## expect_column_values_to_match_regex_with_context + +`elementary.expect_column_values_to_match_regex_with_context` + +Expects column values to match a given regular expression. Extends `dbt_expectations.expect_column_values_to_match_regex`. + + + Requires `dbt_expectations` to be installed in your project. + + +### Parameters + +| Parameter | Required | Default | Description | +| ----------------- | -------- | ------- | ------------------------------------------------------------------------------- | +| `column_name` | Yes | — | The column to test. | +| `regex` | Yes | — | The regular expression pattern to match. | +| `row_condition` | No | `none` | Optional SQL filter applied before testing. | +| `is_raw` | No | `false` | Whether the regex is a raw string. | +| `flags` | No | `""` | Optional regex flags (adapter-dependent). | +| `context_columns` | No | `none` | List of additional columns to return with failing rows. Omit to return all columns. | + + + +```yml Email format validation with context +models: + - name: customers + columns: + - name: email + data_tests: + - elementary.expect_column_values_to_match_regex_with_context: + regex: "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$" + context_columns: [customer_id, name, created_at] +``` + + + +--- + +## relationships_with_context + +`elementary.relationships_with_context` + +Validates referential integrity between a child and parent table. Extends dbt's built-in `relationships` test. + +### Parameters + +| Parameter | Required | Default | Description | +| ----------------- | -------- | ------- | ------------------------------------------------------------------------------- | +| `column_name` | Yes | — | The foreign key column in the child model. | +| `to` | Yes | — | The parent model (use `ref()` or `source()`). | +| `field` | Yes | — | The column in the parent model to join to. | +| `context_columns` | No | `none` | List of additional columns from the child model to return with failing rows. Omit to return all columns. | + + + +```yml With context columns +models: + - name: orders + columns: + - name: customer_id + data_tests: + - elementary.relationships_with_context: + to: ref('customers') + field: id + context_columns: [order_id, order_date, amount] +``` + +```yml Return all columns +models: + - name: orders + columns: + - name: customer_id + data_tests: + - elementary.relationships_with_context: + to: ref('customers') + field: id +``` + + diff --git a/docs/docker-compose.yaml b/docs/docker-compose.yaml index 218141ff7..5745e8ce8 100644 --- a/docs/docker-compose.yaml +++ b/docs/docker-compose.yaml @@ -1,7 +1,9 @@ +version: "3.8" + services: docs: build: . volumes: - .:/app ports: - - "3000:3000" + - 3000:3000 diff --git a/docs/docs.json b/docs/docs.json new file mode 100644 index 000000000..863341435 --- /dev/null +++ b/docs/docs.json @@ -0,0 +1,704 @@ +{ + "$schema": "https://mintlify.com/docs.json", + "theme": "mint", + "name": "Elementary", + "colors": { + "primary": "#FF20B8", + "light": "#FF20B8", + "dark": "#FF20B8" + }, + "favicon": "https://res.cloudinary.com/do5hrgokq/image/upload/v1764678729/faviconnew_ckysnt.png", + "navigation": { + "tabs": [ + { + "tab": "Home", + "pages": ["home"] + }, + { + "tab": "Elementary Cloud", + "groups": [ + { + "group": "Getting Started", + "pages": [ + "cloud/introduction", + "cloud/quickstart", + "cloud/features", + "cloud/features/integrations", + "cloud/general/security-and-privacy", + "cloud/ai-agents/overview", + "cloud/mcp/intro" + ] + }, + { + "group": "Ella: AI Agents", + "pages": [ + "cloud/ai-agents/governance-agent", + "cloud/ai-agents/triage-resolution-agent", + "cloud/ai-agents/test-recommendation-agent", + "cloud/ai-agents/catalog-agent", + "cloud/ai-agents/performance-cost-agent" + ] + }, + { + "group": "Anomaly Detection Monitors", + "pages": [ + "cloud/features/anomaly-detection/monitors-overview", + { + "group": "Automated monitors", + "pages": [ + "cloud/features/anomaly-detection/automated-monitors", + "cloud/features/anomaly-detection/automated-freshness", + "cloud/features/anomaly-detection/automated-volume" + ] + }, + { + "group": "Configuration and Feedback", + "pages": [ + "cloud/features/anomaly-detection/monitors-configuration", + "cloud/features/anomaly-detection/monitors-feedback" + ] + }, + "cloud/features/anomaly-detection/metrics" + ] + }, + { + "group": "Data Testing", + "pages": [ + { + "group": "Data Tests", + "pages": [ + "cloud/features/data-tests/data-tests-overview", + "cloud/features/data-tests/dbt-tests", + "cloud/features/data-tests/custom-sql-tests", + "cloud/features/data-tests/schema-validation-test" + ] + }, + "cloud/features/data-tests/test-coverage-screen" + ] + }, + { + "group": "Data Lineage", + "pages": [ + "cloud/features/data-lineage/lineage", + "cloud/features/data-lineage/column-level-lineage", + "cloud/features/data-lineage/exposures-lineage", + "cloud/features/anomaly-detection/monitor-dwh-assets" + ] + }, + { + "group": "Alerts and Incidents", + "pages": [ + "cloud/features/alerts-and-incidents/alerts-and-incidents-overview", + { + "group": "Setup & configure alerts", + "pages": [ + "cloud/features/alerts-and-incidents/alert-configuration", + "cloud/features/alerts-and-incidents/alert-rules", + "cloud/guides/alerts-configuration", + "cloud/features/alerts-and-incidents/alert-destinations", + "cloud/features/alerts-and-incidents/owners-and-subscribers" + ] + }, + "cloud/features/alerts-and-incidents/incidents", + "cloud/features/alerts-and-incidents/incident-management", + "cloud/features/alerts-and-incidents/incident-digest" + ] + }, + { + "group": "Performance & Cost", + "pages": [ + "cloud/features/performance-monitoring/performance-monitoring", + "cloud/features/performance-monitoring/performance-alerts" + ] + }, + { + "group": "Data Governance", + "pages": [ + "cloud/best-practices/governance-for-observability", + "cloud/features/data-governance/critical_assets", + "cloud/features/data-governance/manage-metadata", + "cloud/features/data-governance/ai-descriptions" + ] + }, + { + "group": "Collaboration & Communication", + "pages": [ + "cloud/features/collaboration-and-communication/data-observability-dashboard", + { + "group": "Data Health Scores", + "pages": [ + "cloud/features/collaboration-and-communication/data-quality-dimensions", + "cloud/features/collaboration-and-communication/data-health" + ] + }, + "cloud/features/collaboration-and-communication/catalog", + "cloud/features/collaboration-and-communication/saved-views" + ] + }, + { + "group": "MCP Server", + "pages": [ + "cloud/mcp/overview", + "cloud/mcp/setup-guide", + "cloud/mcp/mcp-tools", + "cloud/mcp/recommended-rules" + ] + }, + { + "group": "Additional features", + "pages": [ + { + "group": "Config-as-Code", + "pages": [ + "cloud/features/config-as-code", + "cloud/features/gpg-signed-commits" + ] + }, + "cloud/features/roles-and-permissions", + "cloud/features/multi-env", + "cloud/features/ci", + { + "group": "Audit Logs", + "pages": [ + "cloud/features/collaboration-and-communication/audit_logs/overview", + "cloud/features/collaboration-and-communication/audit_logs/user-activity-logs", + "cloud/features/collaboration-and-communication/audit_logs/system-logs" + ] + } + ] + }, + { + "group": "Guides", + "pages": [ + { + "group": "Onboarding guides", + "pages": [ + "cloud/guides/set-up-elementary", + "cloud/guides/start-using-elementary", + "cloud/onboarding/quickstart-dbt-package", + "cloud/onboarding/connect-data-warehouse", + "cloud/manage-team" + ] + }, + "cloud/guides/sync-scheduling", + "cloud/guides/dev-prod-configuration", + "cloud/guides/reduce-on-run-end-time", + "cloud/guides/collect-job-data", + "cloud/guides/collect-source-freshness", + "cloud/guides/troubleshoot" + ] + }, + { + "group": "Integrations", + "pages": [ + "cloud/integrations/elementary-integrations", + { + "group": "Data warehouses", + "pages": [ + "cloud/integrations/dwh/snowflake", + "cloud/integrations/dwh/bigquery", + "cloud/integrations/dwh/redshift", + "cloud/integrations/dwh/databricks", + "cloud/integrations/dwh/postgres", + "cloud/integrations/dwh/athena", + "cloud/integrations/dwh/dremio", + "cloud/integrations/dwh/clickhouse", + "cloud/integrations/dwh/trino", + "cloud/integrations/dwh/duckdb", + "cloud/integrations/dwh/spark", + "cloud/integrations/dwh/fabric", + "cloud/integrations/dwh/sqlserver", + "cloud/integrations/dwh/vertica" + ] + }, + { + "group": "Transformation & Orchestration", + "pages": [ + "cloud/integrations/transformation-and-orchestration/dbt-core", + "cloud/integrations/transformation-and-orchestration/dbt-cloud", + "cloud/integrations/transformation-and-orchestration/dbt-fusion", + "cloud/integrations/transformation-and-orchestration/airflow", + "cloud/integrations/transformation-and-orchestration/orchestration-tools" + ] + }, + { + "group": "Data visualization", + "pages": [ + "cloud/integrations/bi/connect-bi-tool", + "cloud/integrations/bi/looker", + "cloud/integrations/bi/tableau", + "cloud/integrations/bi/power-bi", + "cloud/integrations/bi/sigma", + "cloud/integrations/bi/metabase", + "cloud/integrations/bi/thoughtspot", + "cloud/integrations/bi/mode", + "cloud/integrations/bi/hex", + "cloud/integrations/bi/lightdash", + "cloud/integrations/bi/explo" + ] + }, + { + "group": "Reverse ETL", + "pages": [ + "cloud/integrations/reverse-etl/census", + "cloud/integrations/reverse-etl/hightouch" + ] + }, + { + "group": "Code repositories", + "pages": [ + "cloud/integrations/code-repo/connect-code-repo", + "cloud/integrations/code-repo/github", + "cloud/integrations/code-repo/gitlab", + "cloud/integrations/code-repo/bitbucket", + "cloud/integrations/code-repo/azure-devops" + ] + }, + { + "group": "Alerts & Incidents", + "pages": [ + "cloud/integrations/alerts/slack", + "cloud/integrations/alerts/ms-teams", + "cloud/integrations/alerts/pagerduty", + "cloud/integrations/alerts/opsgenie", + "cloud/integrations/alerts/jira", + "cloud/integrations/alerts/linear", + "cloud/integrations/alerts/servicenow", + "cloud/integrations/alerts/webhooks", + "cloud/integrations/alerts/email" + ] + }, + { + "group": "Log Streaming", + "pages": [ + "cloud/integrations/log-streaming/datadog", + "cloud/integrations/log-streaming/splunk", + "cloud/integrations/log-streaming/gcs" + ] + }, + { + "group": "Governance", + "pages": [ + "cloud/integrations/governance/atlan" + ] + }, + { + "group": "Iceberg catalog", + "pages": [ + "cloud/integrations/metadata-layer/glue" + ] + }, + { + "group": "Security and Connectivity", + "pages": [ + "cloud/integrations/security-and-connectivity/aws-privatelink-integration", + "cloud/integrations/security-and-connectivity/okta", + "cloud/integrations/security-and-connectivity/ms-entra" + ] + } + ] + }, + { + "group": "Resources", + "pages": [ + "cloud/cloud-vs-oss", + { + "group": "Best Practices Guide", + "icon": "check", + "pages": [ + "cloud/best-practices/introduction", + "cloud/best-practices/governance-for-observability", + "cloud/best-practices/detection-and-coverage", + "cloud/best-practices/triage-and-response" + ] + }, + "cloud/resources/pricing", + "cloud/resources/community" + ] + } + ] + }, + { + "tab": "dbt Package and Tests", + "groups": [ + { + "group": "Elementary dbt package", + "pages": [ + "data-tests/dbt/dbt-package", + "data-tests/dbt/quickstart-package", + "data-tests/dbt/upgrade-package", + "data-tests/dbt/dbt-artifacts", + { + "group": "on-run-end hooks", + "pages": [ + "data-tests/dbt/on-run-end_hooks", + "data-tests/dbt/reduce-on-run-end-time" + ] + }, + "data-tests/dbt/package-models", + "data-tests/test-result-samples" + ] + }, + { + "group": "Elementary Data tests", + "pages": [ + "data-tests/introduction", + "data-tests/anomaly-detection-tests-oss-vs-cloud", + "data-tests/elementary-tests-configuration" + ] + }, + { + "group": "Anomaly Detection Tests", + "pages": [ + { + "group": "How it works?", + "pages": [ + "data-tests/how-anomaly-detection-works", + "data-tests/data-anomaly-detection" + ] + }, + { + "group": "Anomaly tests params", + "pages": [ + "data-tests/anomaly-detection-configuration/anomaly-params", + "data-tests/anomaly-detection-configuration/timestamp-column", + "data-tests/anomaly-detection-configuration/where-expression", + "data-tests/anomaly-detection-configuration/anomaly-sensitivity", + "data-tests/anomaly-detection-configuration/anomaly-direction", + "data-tests/anomaly-detection-configuration/training-period", + "data-tests/anomaly-detection-configuration/detection-period", + "data-tests/anomaly-detection-configuration/exclude_detection_period_from_training", + "data-tests/anomaly-detection-configuration/time-bucket", + "data-tests/anomaly-detection-configuration/seasonality", + "data-tests/anomaly-detection-configuration/column-anomalies", + "data-tests/anomaly-detection-configuration/exclude_prefix", + "data-tests/anomaly-detection-configuration/exclude_regexp", + "data-tests/anomaly-detection-configuration/dimensions", + "data-tests/anomaly-detection-configuration/event_timestamp_column", + "data-tests/anomaly-detection-configuration/update_timestamp_column", + "data-tests/anomaly-detection-configuration/ignore_small_changes", + "data-tests/anomaly-detection-configuration/fail_on_zero", + "data-tests/anomaly-detection-configuration/detection-delay", + "data-tests/anomaly-detection-configuration/anomaly-exclude-metrics", + "data-tests/anomaly-detection-configuration/exclude-final-results" + ] + }, + "data-tests/anomaly-detection-tests/volume-anomalies", + "data-tests/anomaly-detection-tests/freshness-anomalies", + "data-tests/anomaly-detection-tests/event-freshness-anomalies", + "data-tests/anomaly-detection-tests/dimension-anomalies", + "data-tests/anomaly-detection-tests/all-columns-anomalies", + "data-tests/anomaly-detection-tests/column-anomalies", + "data-tests/anomaly-detection-tests/Anomaly-troubleshooting-guide" + ] + }, + { + "group": "Schema Tests", + "pages": [ + "data-tests/schema-tests/schema-changes", + "data-tests/schema-tests/schema-changes-from-baseline", + "data-tests/schema-tests/json-schema", + "data-tests/schema-tests/exposure-tests" + ] + }, + { + "group": "AI Data Tests (Beta)", + "pages": [ + "data-tests/ai-data-tests/ai_data_validations", + "data-tests/ai-data-tests/unstructured_data_validations", + { + "group": "Supported Platforms", + "pages": [ + "data-tests/ai-data-tests/supported-platforms/snowflake", + "data-tests/ai-data-tests/supported-platforms/databricks", + "data-tests/ai-data-tests/supported-platforms/bigquery", + "data-tests/ai-data-tests/supported-platforms/redshift", + "data-tests/ai-data-tests/supported-platforms/data-lakes" + ] + } + ] + }, + { + "group": "Other Tests", + "pages": [ + "data-tests/python-tests", + "data-tests/execution-sla", + "data-tests/data-freshness-sla", + "data-tests/volume-threshold", + "data-tests/with-context-tests" + ] + }, + { + "group": "dbt Package Release Notes", + "collapsible": true, + "collapsed": true, + "pages": [ + "data-tests/dbt/release-notes/releases/0.20.1", + "data-tests/dbt/release-notes/releases/0.20.0", + "data-tests/dbt/release-notes/releases/0.19.4", + "data-tests/dbt/release-notes/releases/0.19.3", + "data-tests/dbt/release-notes/releases/0.19.2", + "data-tests/dbt/release-notes/releases/0.19.1", + "data-tests/dbt/release-notes/releases/0.19.0", + "data-tests/dbt/release-notes/releases/0.18.3", + "data-tests/dbt/release-notes/releases/0.18.2", + "data-tests/dbt/release-notes/releases/0.18.1" + ] + } + ] + }, + { + "tab": "Python SDK", + "groups": [ + { + "group": "Python SDK", + "pages": [ + "python-sdk/introduction", + "python-sdk/installation", + "python-sdk/quickstart", + { + "group": "API Reference", + "pages": [ + "python-sdk/api-reference/overview", + "python-sdk/api-reference/test-decorators", + "python-sdk/api-reference/table-assets" + ] + } + ] + } + ] + }, + { + "tab": "Elementary OSS", + "groups": [ + { + "group": "Elementary OSS", + "pages": [ + "oss/oss-introduction" + ] + }, + { + "group": "Quickstart", + "icon": "circle-play", + "pages": [ + "oss/quickstart/quickstart-cli-package", + "oss/quickstart/quickstart-cli", + "oss/quickstart/quickstart-tests", + "oss/quickstart/quickstart-report", + "oss/quickstart/stay-updated", + "oss/quickstart/quickstart-alerts", + "oss/quickstart/quickstart-prod", + "oss/quickstart/quickstart-support" + ] + }, + { + "group": "Guides", + "pages": [ + "oss/guides/generate-report-ui", + { + "group": "Share observability report", + "pages": [ + "oss/guides/share-report-ui", + "oss/guides/share-observability-report/share-via-slack", + "oss/guides/share-observability-report/host-on-s3", + "oss/guides/share-observability-report/host-on-gcs", + "oss/guides/share-observability-report/host-on-azure", + "oss/guides/share-observability-report/send-report-summary" + ] + }, + { + "group": "Send alerts", + "pages": [ + "oss/guides/alerts/elementary-alerts", + "oss/guides/alerts/send-slack-alerts", + "oss/guides/alerts/send-teams-alerts", + "oss/guides/alerts/alerts-configuration" + ] + }, + "oss/guides/collect-job-data", + "oss/guides/collect-dbt-source-freshness", + "oss/guides/reduce-on-run-end-time", + "oss/guides/performance-alerts" + ] + }, + { + "group": "Configuration & usage", + "pages": [ + "oss/cli-install", + "oss/cli-commands" + ] + }, + { + "group": "Deployment", + "pages": [ + "oss/deployment-and-configuration/elementary-in-production", + { + "group": "Deployment options", + "pages": [ + "oss/deployment-and-configuration/docker", + "oss/deployment-and-configuration/github-actions", + "oss/deployment-and-configuration/gitlab-ci" + ] + } + ] + }, + { + "group": "Integrations", + "pages": [ + "oss/integrations/dbt", + "oss/integrations/dbt-fusion", + "oss/deployment-and-configuration/slack", + "oss/deployment-and-configuration/teams" + ] + }, + { + "group": "Community & Support", + "pages": [ + "oss/general/troubleshooting", + "oss/general/faq", + "oss/general/contributions", + "oss/general/community-and-support" + ] + }, + { + "group": "Release Notes", + "pages": [ + "oss/release-notes/upgrading-elementary", + { + "group": "Releases", + "pages": [ + "oss/release-notes/releases/0.20.0", + "oss/release-notes/releases/0.19.5", + "oss/release-notes/releases/0.19.4", + "oss/release-notes/releases/0.19.3", + "oss/release-notes/releases/0.19.2", + "oss/release-notes/releases/0.19.1", + "oss/release-notes/releases/0.19.0", + "oss/release-notes/releases/0.18.3", + "oss/release-notes/releases/0.18.2", + "oss/release-notes/releases/0.18.1", + "oss/release-notes/releases/0.18.0", + "oss/release-notes/releases/0.17.0", + "oss/release-notes/releases/0.16.2", + "oss/release-notes/releases/0.16.1", + "oss/release-notes/releases/0.16.0", + "oss/release-notes/releases/0.11.2", + "oss/release-notes/releases/0.10.0", + "oss/release-notes/releases/0.9.1", + "oss/release-notes/releases/0.8.2", + "oss/release-notes/releases/0.8.0", + "oss/release-notes/releases/0.7.10", + "oss/release-notes/releases/0.7.7", + "oss/release-notes/releases/0.7.6", + "oss/release-notes/releases/0.7.5", + "oss/release-notes/releases/0.7.2", + "oss/release-notes/releases/0.6.10", + "oss/release-notes/releases/0.6.7", + "oss/release-notes/releases/0.6.5", + "oss/release-notes/releases/0.6.3", + "oss/release-notes/releases/0.6.1", + "oss/release-notes/releases/0.5.4" + ] + } + ] + } + ] + } + ], + "global": { + "anchors": [ + { + "anchor": "Book a Demo", + "href": "https://meetings-eu1.hubspot.com/joost-boonzajer-flaes/intro-call-docs", + "icon": "calendar-check" + }, + { + "anchor": "Join Slack", + "href": "https://elementary-data.com/community", + "icon": "slack" + } + ] + } + }, + "logo": { + "light": "https://res.cloudinary.com/do5hrgokq/image/upload/v1764079214/Elementary_2025_Full_Color_Logo_mtdixp.png", + "dark": "https://res.cloudinary.com/do5hrgokq/image/upload/v1764079353/Elementary_2025_White_Logo_Pink_Mark_l85bke.png" + }, + "appearance": { + "default": "dark" + }, + "background": { + "color": { + "light": "#FFFFFF", + "dark": "#0F0513" + } + }, + "navbar": { + "links": [], + "primary": { + "type": "button", + "label": "Join the Community", + "href": "https://join.slack.com/t/elementary-community/shared_invite/zt-3s3uv8znb-7eBuG~ApwOa637dpVFo9Yg" + } + }, + "banner": { + "content": "[Monitor Python pipelines alongside dbt. Explore the Python SDK.](https://www.elementary-data.com/post/data-quality-in-python-pipelines)", + "dismissible": false + }, + "search": { + "prompt": "Search..." + }, + "footer": { + "socials": { + "website": "https://www.elementary-data.com", + "slack": "https://elementary-data.com/community" + } + }, + "integrations": { + "gtm": { + "tagId": "GTM-WCKCG3D4" + }, + "posthog": { + "apiKey": "phc_56XBEzZmh02mGkadqLiYW51eECyYKWPyecVwkGdGUfg" + } + }, + "js": ["kapa-widget.js"], + "redirects": [ + { + "source": "/cloud/features/collaboration-and-communication/user-activity-logs", + "destination": "/cloud/features/collaboration-and-communication/audit_logs/user-activity-logs" + }, + { + "source": "/features/lineage", + "destination": "/cloud/features/data-lineage/lineage" + }, + { + "source": "/features/exposures-lineage", + "destination": "/cloud/features/data-lineage/exposures-lineage" + }, + { + "source": "/features/column-level-lineage", + "destination": "/cloud/features/data-lineage/column-level-lineage" + }, + { + "source": "/features/automated-monitors", + "destination": "/cloud/features/anomaly-detection/automated-monitors" + }, + { + "source": "/features/data-tests", + "destination": "/cloud/features/data-tests/dbt-tests" + }, + { + "source": "/features/elementary-alerts", + "destination": "/cloud/features/alerts-and-incidents/alerts-and-incidents-overview" + }, + { + "source": "/features/catalog", + "destination": "/cloud/features/collaboration-and-communication/catalog" + }, + { + "source": "/features/data-observability-dashboard", + "destination": "/cloud/features/collaboration-and-communication/data-observability-dashboard" + } + ] +} diff --git a/docs/elementary_orange_favicon.png b/docs/elementary_orange_favicon.png new file mode 100644 index 000000000..ee6fadd61 Binary files /dev/null and b/docs/elementary_orange_favicon.png differ diff --git a/docs/home.mdx b/docs/home.mdx new file mode 100644 index 000000000..e34ad53f1 --- /dev/null +++ b/docs/home.mdx @@ -0,0 +1,1106 @@ +--- +title: "Home" +mode: "custom" +icon: "house" +--- + +
+
+
+
+
+
+

+ The Data & AI
+ Control Plane +

+

+ Elementary is built for and trusted by
+ 1000+ engineering and analytics teams. +

+ + What is Elementary? + + +
+
+ Hero Image +
+
+
+
+
+
+
+ + + + +
+
+

SOC 2 & HIPAA Compliant

+

Enterprise-grade security standards

+
+
+
+
+
+ + + + + +
+
+

Metadata-only Architecture

+

We never access or process your raw data

+
+
+
+ +
Security & Privacy
+
+ + + +
+
+
+
+ + +
+
+
+
+

+ Discover what you can do
+ with Elementary +

+ +
+
+
+
+ +
+
+
+ + Detect issues before they reach consumers + +
+ + + + + + + +
+
+
+
+
+
+
+ + Take action with triage and response + +
+ + + + + +
+
+
+
+ +
+
+
+
+

Learn more about Elementary

+
+
+ +
+ + +
+
+
+
+
+
+
+
+
+

Elementary context
in your tools

+

The MCP Server brings models, lineage, and incidents into tools like Cursor and Claude—so you can act without leaving your workflow.

+ +
+
+ +
+
+
+
+
+ +
+
+
+ + Allow business users to monitor and maintain their data + +
+ + + + + +
+
+
+
+ +
+
+
+ + + Easily apply governance + +
+ + + + + + +
+
+
+
+
+
+
+
+
+

Integrations

+

+ Connect Elementary to every part of your data stack—from your warehouse to BI, + incident management and messaging tools. +

+ Check out the integrations + +
+
+

OSS integrations

+
+ {integrations + .filter((integration) => integration.oss === true) + .map((integration, index) => ( + + ))} +
+

Cloud integrations

+
+ {integrations + .filter((integration) => integration.cloud === true) + .sort((a, b) => { + // "Suggest an integration" always last + if (a.label === "Suggest an integration") return 1; + if (b.label === "Suggest an integration") return -1; + + // Available now before coming soon + if (a.comingSoon !== b.comingSoon) { + return a.comingSoon ? 1 : -1; + } + + // Alphabetical order + return a.label.localeCompare(b.label); + }) + .map((integration, index) => ( + + ))} +
+
+
+
+
+ +
+
+
+
+
+

Get started now

+

+ Whether you need a lightweight start or an enterprise-ready platform, Elementary has a solution for you. +

+
+
+ +
+
+
+
+

Featured guides

+

+ Learn from the Elementary team and thousands of users from our community how to set up + effective, scalable data observability. +

+
+
+ + + +
+
+
+
+
+
+
+
+

Join the community

+

+ Join the Elementary Community for AI and team support, explore best practices, and stay up to + date on OSS and Cloud. Connect with other data professionals and help shape what's next. +

+ + Join now + +
+
+
+
+ + + +export const cloudinaryBase = "https://res.cloudinary.com/diuctyblm/image/upload/v1749636234"; +export const cloudinaryIntegrationsBase = `${cloudinaryBase}/Integrations`; +export const integrations = [{ + image: `${cloudinaryIntegrationsBase}/Webhooks_h6dxdn.png`, + label: "Webhooks", + cloud: true, + oss: false, + invertOnDark: false, + link: "/cloud/integrations/alerts/webhooks", + comingSoon: false, +}, { + image: `${cloudinaryIntegrationsBase}/ThoughtSpot_bugqhl.png`, + label: "ThoughtSpot", + cloud: true, + oss: false, + invertOnDark: true, + link: "/cloud/integrations/bi/thoughtspot", + comingSoon: false, +}, { + image: `${cloudinaryIntegrationsBase}/Snowflake_hhsqbs.png`, + label: "Snowflake", + cloud: true, + oss: false, + invertOnDark: false, + link: "/cloud/integrations/dwh/snowflake", + comingSoon: false, +}, { + image: `${cloudinaryIntegrationsBase}/Tableau_tezzpt.png`, + label: "Tableau", + cloud: true, + oss: false, + invertOnDark: false, + link: "/cloud/integrations/bi/tableau", + comingSoon: false, +}, { + image: `${cloudinaryIntegrationsBase}/Slack_qrfsfy.png`, + label: "Slack", + cloud: true, + oss: true, + invertOnDark: false, + link: "/cloud/integrations/alerts/slack", + comingSoon: false, +}, { + image: `${cloudinaryIntegrationsBase}/Sigma_oku5lp.png`, + label: "Sigma", + cloud: true, + oss: false, + invertOnDark: false, + link: "/cloud/integrations/bi/sigma", + comingSoon: false, +}, { + image: `${cloudinaryIntegrationsBase}/Redshift_tetdk7.png`, + label: "Redshift", + cloud: true, + oss: false, + invertOnDark: false, + link: "/cloud/integrations/dwh/redshift", + comingSoon: false, +}, { + image: `${cloudinaryIntegrationsBase}/PowerBI_sxxgbr.png`, + label: "PowerBI", + cloud: true, + oss: false, + invertOnDark: false, + link: "/cloud/integrations/bi/power-bi", + comingSoon: false, +}, { + image: `${cloudinaryIntegrationsBase}/Postgres_eqy6fb.png`, + label: "Postgres", + cloud: true, + oss: false, + invertOnDark: false, + link: "/cloud/integrations/dwh/postgres", + comingSoon: false, +}, { + image: `${cloudinaryIntegrationsBase}/Pagerduty_wv0mmz.png`, + label: "Pagerduty", + cloud: true, + oss: false, + invertOnDark: false, + link: "/cloud/integrations/alerts/pagerduty", + comingSoon: false, +}, { + image: `${cloudinaryIntegrationsBase}/Opsgenie_yjoubp.png`, + label: "Opsgenie", + cloud: true, + oss: false, + invertOnDark: false, + link: "/cloud/integrations/alerts/opsgenie", + comingSoon: false, +}, { + image: `${cloudinaryIntegrationsBase}/Mode_xbydj4.png`, + label: "Mode", + cloud: true, + oss: false, + invertOnDark: false, + link: "https://tally.so/r/3N6DlW?integration=Mode", + comingSoon: true, +}, { + image: `${cloudinaryIntegrationsBase}/Microsoft_Teams_ibh1cm.png`, + label: "Microsoft Teams", + cloud: true, + oss: true, + invertOnDark: false, + link: "/cloud/integrations/alerts/ms-teams", + comingSoon: false, +}, { + image: `${cloudinaryIntegrationsBase}/Metabase_zfrsgu.png`, + label: "Metabase", + cloud: true, + oss: false, + invertOnDark: false, + link: "https://tally.so/r/3N6DlW?integration=Metabase", + comingSoon: true, +}, { + image: `${cloudinaryIntegrationsBase}/Looker_rih3v5.png`, + label: "Looker", + cloud: true, + oss: false, + invertOnDark: false, + link: "/cloud/integrations/bi/looker", + comingSoon: false, +}, { + image: `${cloudinaryIntegrationsBase}/Linear_qsjld8.png`, + label: "Linear", + cloud: true, + oss: false, + invertOnDark: false, + link: "/cloud/integrations/alerts/linear", + comingSoon: false, +}, { + image: `${cloudinaryIntegrationsBase}/Jira_m7le7c.png`, + label: "Jira", + cloud: true, + oss: false, + invertOnDark: false, + link: "/cloud/integrations/alerts/jira", + comingSoon: false, +}, { + image: `${cloudinaryIntegrationsBase}/Hightouch_oaqpiy.png`, + label: "Hightouch", + cloud: true, + oss: false, + invertOnDark: true, + link: "https://tally.so/r/3N6DlW?integration=Hightouch", + comingSoon: true, +}, { + image: "https://res.cloudinary.com/do5hrgokq/image/upload/v1764678628/1718232346589_fw6ckf.jpg", + label: "Hex", + cloud: true, + oss: false, + invertOnDark: false, + link: "https://tally.so/r/3N6DlW?integration=Hex", + comingSoon: true, +}, { + image: `${cloudinaryIntegrationsBase}/Gitlab_lwgplm.png`, + label: "Gitlab", + cloud: true, + oss: false, + invertOnDark: false, + link: "/cloud/integrations/code-repo/gitlab", + comingSoon: false, +}, { + image: `${cloudinaryIntegrationsBase}/Github_jrhh77.png`, + label: "Github", + cloud: true, + oss: false, + invertOnDark: true, + link: "/cloud/integrations/code-repo/github", + comingSoon: false, +}, { + image: `${cloudinaryIntegrationsBase}/Explo_mngybv.png`, + label: "Explo", + cloud: true, + oss: false, + invertOnDark: false, + link: "https://tally.so/r/3N6DlW?integration=Explo", + comingSoon: true, +}, { + image: `${cloudinaryIntegrationsBase}/dbt_zqc8rn.png`, + label: "dbt", + cloud: true, + oss: true, + invertOnDark: false, + link: "/cloud/integrations/transformation-and-orchestration/dbt-core", + comingSoon: false, +}, { + image: `${cloudinaryIntegrationsBase}/Databricks_bes8vc.png`, + label: "Databricks", + cloud: true, + oss: false, + invertOnDark: false, + link: "/cloud/integrations/dwh/databricks", + comingSoon: false, +}, { + image: `${cloudinaryIntegrationsBase}/ClickHouse_czq5cy.png`, + label: "ClickHouse", + cloud: true, + oss: false, + invertOnDark: true, + link: "/cloud/integrations/dwh/clickhouse", + comingSoon: false, +}, { + image: `${cloudinaryIntegrationsBase}/Athena_geo100.png`, + label: "Athena", + cloud: true, + oss: false, + invertOnDark: false, + link: "/cloud/integrations/dwh/athena", + comingSoon: false, +}, { + image: `${cloudinaryIntegrationsBase}/Census_bksxka.png`, + label: "Census", + cloud: true, + oss: false, + invertOnDark: false, + link: "https://tally.so/r/3N6DlW?integration=Census", + comingSoon: true, +}, { + image: `${cloudinaryIntegrationsBase}/Bitbucket_gdaafw.png`, + label: "Bitbucket", + cloud: true, + oss: false, + invertOnDark: false, + link: "/cloud/integrations/code-repo/bitbucket", + comingSoon: false, +}, { + image: `${cloudinaryIntegrationsBase}/Bigquery_wdpiph.png`, + label: "Bigquery", + cloud: true, + oss: false, + invertOnDark: false, + link: "/cloud/integrations/dwh/bigquery", + comingSoon: false, +}, { + image: `${cloudinaryIntegrationsBase}/Airflow_ld1jiz.png`, + label: "Airflow", + cloud: true, + oss: false, + invertOnDark: false, + link: "/cloud/integrations/transformation-and-orchestration/airflow", + comingSoon: false, +}, { + image: `${cloudinaryIntegrationsBase}/Plus_r7mzun.png`, + label: "Suggest an integration", + cloud: true, + oss: false, + invertOnDark: false, + link: "https://t2taztilhde.typeform.com/to/mdJoO9lA", + comingSoon: false, +}, +]; +export const footerLinks = [ + { + header: "Product", + items: [ + { + label: "Detection & Coverage", + href: "https://www.elementary-data.com/detection-and-coverage", + }, + { + label: "Triage and response", + href: "https://www.elementary-data.com/triage-and-resolution", + }, + { label: "Governance", href: "https://www.elementary-data.com/governance-and-discovery" }, + { label: "Enabling business users", href: "https://www.elementary-data.com/business-users" }, + { label: "Integrations", href: "https://www.elementary-data.com/integrations" }, + { label: "Cloud vs OSS", href: "/overview/cloud-vs-oss" }, + ], + }, + { + header: "Resources", + items: [ + { label: "Blog", href: "https://www.elementary-data.com/blog" }, + { label: "Webinars", href: "https://www.elementary-data.com/webinars" }, + { label: "dbt Tests Hub", href: "https://www.elementary-data.com/dbt-test-hub" }, + { + label: "Cloud demo videos", + href: "https://www.elementary-data.com/elementary-cloud-demo-videos", + }, + { label: "Customer Stories", href: "https://www.elementary-data.com/customer-stories" }, + { label: "Documentation", href: "/introduction" }, + { label: "The Elementary Community", href: "https://www.elementary-data.com/community" }, + ], + }, + { + header: "Company", + items: [ + { label: "About Us", href: "https://www.elementary-data.com/company" }, + { label: "Careers", href: "https://www.elementary-data.com/careers" }, + { label: "Contact Us", href: "mailto:founders@elementary-data.com" }, + { label: "Terms of Services", href: "https://www.elementary-data.com/terms-of-service" }, + { label: "Privacy Policy", href: "https://www.elementary-data.com/privacy" }, + ], + }, +]; + +export const FeatureCard = ({ + heading = "Your Title", + description = "Your description here", + icon = "flag", + link = "#", + showCloudFeature = true, + comingSoon = false, +}) => { + const CardContent = () => ( +
+
+
+
+ +
+

{heading}

+
+ {showCloudFeature && ( +
+ + + + + + + + + + + + + Cloud Feature +
+ )} +
+
+

{description}

+
+ {comingSoon && ( +
+
+ Coming Soon +
+
+ )} +
+ ); + +if (comingSoon) { +return ( + +
+ +
+); } + +return ( + + + + +); }; + +export const FeatureHeader = ({ children, label = "Read the guide", link }) => ( +
+
+

{children}

+
+ {link && ( + + {label} + + + + + )} +
+); + +export const Integration = ({ image, label, invertOnDark = false, link, comingSoon = false }) => { + const content = ( +
+ {label} +
{label}{comingSoon && (Coming Soon)}
+
+ ); + return link ? ( + {content} + ) : content; +}; + +export const InfoCard = ({ + label = "Recommended", + labelColor = "pink", + title = "What is Elementary?", + description, + link = "#", + layout = "vertical", +}) => { + const labelClass = + labelColor === "gray" + ? "home__info-card__label home__info-card__label--gray" + : "home__info-card__label home__info-card__label--pink"; + const containerClass = + layout === "horizontal" + ? "home__info-card__container home__info-card__container--horizontal" + : "home__info-card__container"; + const headerClass = + layout === "horizontal" + ? "home__info-card__header home__info-card__header--horizontal" + : "home__info-card__header"; + return ( + +
+
+ {layout === "horizontal" ? ( +
+ + {label} + + {title} +
+ ) : ( + {label} + )} + + + + + +
+ {layout !== "horizontal" &&
{title}
} +
{description}
+
+
+ ); +}; + +export const HomeFooter = () => ( + +); \ No newline at end of file diff --git a/docs/kapa-widget.js b/docs/kapa-widget.js new file mode 100644 index 000000000..e0b8cc126 --- /dev/null +++ b/docs/kapa-widget.js @@ -0,0 +1,325 @@ +(function () { + 'use strict'; + + var STORAGE_KEY = 'elementary_kapa_email'; + var HUBSPOT_PORTAL_ID = '142608385'; + var HUBSPOT_FORM_ID = '4734860b-68fb-4f7f-aada-afb14e61afe7'; + var HUBSPOT_SUBMIT_URL = 'https://api.hsforms.com/submissions/v3/integration/submit/' + HUBSPOT_PORTAL_ID + '/' + HUBSPOT_FORM_ID; + var PRIMARY = '#FF20B8'; + var PRIMARY_HOVER = '#E01A9F'; + /** Consumer domains — HubSpot "block free emails" often does not apply to the Forms API; mirror policy in-app. */ + var BLOCKED_CONSUMER_EMAIL_DOMAINS = { + '163.com': true, + '126.com': true, + 'aol.com': true, + 'duck.com': true, + 'fastmail.com': true, + 'gmail.com': true, + 'googlemail.com': true, + 'gmx.com': true, + 'gmx.de': true, + 'gmx.net': true, + 'hey.com': true, + 'hotmail.com': true, + 'hotmail.co.uk': true, + 'icloud.com': true, + 'live.com': true, + 'mac.com': true, + 'mail.com': true, + 'me.com': true, + 'msn.com': true, + 'outlook.com': true, + 'pm.me': true, + 'proton.me': true, + 'protonmail.com': true, + 'qq.com': true, + 'skiff.com': true, + 'tuta.io': true, + 'tutanota.com': true, + 'tutanota.de': true, + 'yahoo.com': true, + 'yahoo.co.uk': true, + 'yahoo.de': true, + 'yahoo.fr': true, + 'yandex.com': true, + 'yandex.ru': true, + }; + var FREE_EMAIL_NOT_ACCEPTED_MSG = 'Please use your work email.'; + + function getStoredEmail() { + try { + return localStorage.getItem(STORAGE_KEY); + } catch (e) { + return null; + } + } + + function storeEmail(email) { + try { + localStorage.setItem(STORAGE_KEY, email); + } catch (e) {} + } + + function openKapa(email) { + window.kapaSettings = { + user: { + email: email, + uniqueClientId: email, + }, + }; + if (window.Kapa && typeof window.Kapa.open === 'function') { + window.Kapa.open(); + } else if (typeof window.Kapa === 'function') { + window.Kapa('open'); + } + } + + function emailDomain(email) { + var i = email.lastIndexOf('@'); + if (i < 0) return ''; + return email + .slice(i + 1) + .toLowerCase() + .trim(); + } + + function isBlockedConsumerEmailDomain(email) { + return !!BLOCKED_CONSUMER_EMAIL_DOMAINS[emailDomain(email)]; + } + + function hubspotErrorMessage(body) { + var fallback = + 'We could not accept this email. Please try again, or use a work email if your company requires it.'; + if (!body || typeof body !== 'object') return fallback; + if (Array.isArray(body.errors) && body.errors.length) { + var first = body.errors[0]; + if (first && first.message) return first.message; + } + if (typeof body.message === 'string' && body.message) return body.message; + if (typeof body.inlineMessage === 'string' && body.inlineMessage) return body.inlineMessage; + return fallback; + } + + function submitToHubSpot(email) { + return fetch(HUBSPOT_SUBMIT_URL, { + method: 'POST', + headers: { 'Content-Type': 'application/json' }, + body: JSON.stringify({ + fields: [{ name: 'email', value: email }], + context: { + pageUri: window.location.href, + pageName: 'Elementary Docs - Ask Elementary AI', + }, + }), + }) + .then(function (res) { + return res.text().then(function (text) { + var body = null; + try { + body = text ? JSON.parse(text) : null; + } catch (parseErr) { + body = null; + } + if (res.ok) { + if (body && Array.isArray(body.errors) && body.errors.length) { + return { ok: false, message: hubspotErrorMessage(body) }; + } + return { ok: true }; + } + return { ok: false, message: hubspotErrorMessage(body) }; + }); + }) + .catch(function () { + return { ok: false, message: 'Something went wrong. Check your connection and try again.' }; + }); + } + + function injectButtonAndPopover() { + if (document.getElementById('elementary-kapa-support-root')) return; + + var root = document.createElement('div'); + root.id = 'elementary-kapa-support-root'; + root.setAttribute('style', 'position:fixed;bottom:0;right:0;z-index:2147483646;pointer-events:none;'); + root.innerHTML = ''; + document.body.appendChild(root); + + var pointerStyle = 'pointer-events:auto;'; + var button = document.createElement('button'); + button.type = 'button'; + button.setAttribute('aria-label', 'Ask AI'); + button.innerHTML = + ' Ask AI'; + button.style.cssText = + pointerStyle + + 'display:inline-flex;align-items:center;position:fixed;bottom:24px;right:24px;padding:12px 20px;background-color:' + + PRIMARY + + ';color:#fff;border:none;border-radius:50px;font-size:14px;font-weight:600;cursor:pointer;box-shadow:0 4px 14px rgba(255,32,184,0.4);font-family:system-ui,-apple-system,sans-serif;'; + button.onmouseenter = function () { + button.style.backgroundColor = PRIMARY_HOVER; + }; + button.onmouseleave = function () { + button.style.backgroundColor = PRIMARY; + }; + + var popover = document.createElement('div'); + popover.id = 'elementary-kapa-popover'; + popover.setAttribute( + 'style', + pointerStyle + + 'display:none;position:fixed;bottom:80px;right:24px;width:100%;max-width:320px;background:#fff;border-radius:12px;padding:20px;box-shadow:0 10px 40px rgba(0,0,0,0.15);font-family:system-ui,-apple-system,sans-serif;z-index:2147483647;' + ); + + var message = document.createElement('div'); + message.style.cssText = 'font-size:14px;color:#374151;margin:0 0 16px;line-height:1.5;'; + message.innerHTML = + 'Ask any question about Elementary.

Leave your email in case a follow up is needed:'; + popover.appendChild(message); + + var form = document.createElement('form'); + form.style.margin = '0'; + + var input = document.createElement('input'); + input.type = 'email'; + input.placeholder = 'you@company.com'; + input.style.cssText = + 'width:100%;box-sizing:border-box;padding:12px 14px;font-size:14px;color:#111;background:#fff;background-color:#fff;border-radius:8px;border:1px solid #e5e7eb;margin-bottom:12px;outline:none;'; + form.appendChild(input); + + var errEl = document.createElement('p'); + errEl.id = 'elementary-kapa-error'; + errEl.style.cssText = 'font-size:12px;color:#dc2626;margin:8px 0 0;display:none;'; + form.appendChild(errEl); + + var submitBtn = document.createElement('button'); + submitBtn.type = 'submit'; + submitBtn.textContent = 'Start chat'; + submitBtn.style.cssText = + 'width:100%;padding:12px 16px;font-size:14px;font-weight:600;color:#fff;background:' + + PRIMARY + + ';border:none;border-radius:8px;cursor:pointer;'; + form.appendChild(submitBtn); + + popover.appendChild(form); + + function hidePopover() { + popover.style.display = 'none'; + } + + function showPopover() { + popover.style.display = 'block'; + input.value = ''; + errEl.style.display = 'none'; + errEl.textContent = ''; + setTimeout(function () { + input.focus(); + }, 50); + } + + form.onsubmit = function (e) { + e.preventDefault(); + var emailVal = (input.value || '').trim(); + if (!emailVal) { + errEl.textContent = 'Please enter your email.'; + errEl.style.display = 'block'; + return; + } + var re = /^[^\s@]+@[^\s@]+\.[^\s@]+$/; + if (!re.test(emailVal)) { + errEl.textContent = 'Please enter a valid email address.'; + errEl.style.display = 'block'; + return; + } + if (isBlockedConsumerEmailDomain(emailVal)) { + errEl.textContent = FREE_EMAIL_NOT_ACCEPTED_MSG; + errEl.style.display = 'block'; + return; + } + errEl.style.display = 'none'; + errEl.textContent = ''; + submitBtn.disabled = true; + submitBtn.textContent = 'Submitting…'; + submitToHubSpot(emailVal).then(function (result) { + submitBtn.disabled = false; + submitBtn.textContent = 'Start chat'; + if (!result.ok) { + errEl.textContent = result.message || 'Please try a different email.'; + errEl.style.display = 'block'; + return; + } + storeEmail(emailVal); + openKapa(emailVal); + hidePopover(); + }); + }; + + button.onclick = function () { + var stored = getStoredEmail(); + if (stored && stored.trim()) { + var s = stored.trim(); + var reStored = /^[^\s@]+@[^\s@]+\.[^\s@]+$/; + if (!reStored.test(s)) { + try { + localStorage.removeItem(STORAGE_KEY); + } catch (removeErr) {} + showPopover(); + return; + } + if (isBlockedConsumerEmailDomain(s)) { + try { + localStorage.removeItem(STORAGE_KEY); + } catch (removeErr2) {} + showPopover(); + return; + } + openKapa(s); + return; + } + if (popover.style.display === 'block') { + hidePopover(); + } else { + showPopover(); + } + }; + + document.addEventListener('mousedown', function (e) { + if ( + popover.style.display !== 'block' || + popover.contains(e.target) || + button.contains(e.target) + ) { + return; + } + hidePopover(); + }); + + root.appendChild(popover); + root.appendChild(button); + } + + function loadKapaScript() { + var script = document.createElement('script'); + script.src = 'https://widget.kapa.ai/kapa-widget.bundle.js'; + script.async = true; + script.setAttribute('data-website-id', 'e558d15b-d976-4a89-b2f0-e33ee6dab58b'); + script.setAttribute('data-project-name', 'Ask Elementary AI'); + script.setAttribute('data-modal-title', 'Ask Elementary AI'); + script.setAttribute('data-modal-title-ask-ai', 'Ask Elementary AI'); + script.setAttribute('data-modal-title-search', 'Ask Elementary AI'); + script.setAttribute('data-modal-ask-ai-input-placeholder', 'Ask any question about Elementary...'); + script.setAttribute('data-project-color', '#FF20B8'); + script.setAttribute('data-project-logo', 'https://res.cloudinary.com/do5hrgokq/image/upload/v1771424391/Elementary_2025_Pink_Mark_Black_Frame_rbexli.png'); + script.setAttribute('data-button-hide', 'true'); + document.head.appendChild(script); + } + + function init() { + loadKapaScript(); + if (document.readyState === 'loading') { + document.addEventListener('DOMContentLoaded', injectButtonAndPopover); + } else { + injectButtonAndPopover(); + } + } + + init(); +})(); diff --git a/docs/key-features.mdx b/docs/key-features.mdx index 75068c125..a16ec539b 100644 --- a/docs/key-features.mdx +++ b/docs/key-features.mdx @@ -9,7 +9,7 @@ icon: "stars" title="Data anomaly detection" icon="monitor-waveform" iconType="solid" - href="/features/data-tests" + href="cloud/features/data-tests" > Monitor data quality metrics, freshness, volume and schema changes.
@@ -17,7 +17,7 @@ icon: "stars" title="dbt artifacts and run results" icon="table-tree" iconType="regular" - href="/dbt/dbt-artifacts" + href="/data-tests/dbt/dbt-artifacts" > Upload metadata, run and test results to tables as part of your jobs.
@@ -25,7 +25,7 @@ icon: "stars" title="Alerts" icon="bell-exclamation" iconType="regular" - href="/features/elementary-alerts" + href="cloud/features/elementary-alerts" > Send informative alerts to different channels and users.
@@ -33,7 +33,7 @@ icon: "stars" title="Data observability dashboard" icon="browsers" iconType="solid" - href="/features/data-observability-dashboard" + href="cloud/features/collaboration-and-communication/data-observability-dashboard" > Inspect your data health overview, test results, models performance and data lineage. @@ -42,14 +42,14 @@ icon: "stars" title="End-to-end data lineage" icon="arrow-progress" iconType="solid" - href="/features/lineage" + href="cloud/features/data-lineage/lineage" > Inspect dependencies including Column Level Lineage and integration with BI tools. @@ -57,7 +57,7 @@ icon: "stars" @@ -67,7 +67,7 @@ icon: "stars" title="Data Catalog" icon="folder-tree" iconType="solid" - href="/features/catalog" + href="cloud/features/collaboration-and-communication/catalog" > Explore and discover data sets, manage your documentation in code. diff --git a/docs/oss/cli-commands.mdx b/docs/oss/cli-commands.mdx index 748315251..1d969e436 100644 --- a/docs/oss/cli-commands.mdx +++ b/docs/oss/cli-commands.mdx @@ -16,15 +16,13 @@ Read from the test results table and [send new alerts](/oss/guides/alerts/alerts edr monitor ``` -Read from the test results table and generate the [Elementary UI](/features/data-observability-dashboard): +Read from the test results table and generate the Elementary report: ```shell edr report ``` -Read from the test results table and generate the [Elementary UI](/features/data-observability-dashboard) and send -it to external -platforms such as Slack, S3, GCS: +Read from the test results table and generate the Elementary report and send it to external platforms such as Slack, S3, GCS: ```shell edr send-report diff --git a/docs/oss/cli-install.mdx b/docs/oss/cli-install.mdx index 6091b0cb1..98e0e770a 100644 --- a/docs/oss/cli-install.mdx +++ b/docs/oss/cli-install.mdx @@ -2,9 +2,16 @@ title: "CLI install & configure" --- +import InstallCli from '/snippets/install-cli.mdx'; +import AddConnectionProfile from '/snippets/add-connection-profile.mdx'; +import QuestionConnectionProfile from '/snippets/faq/question-connection-profile.mdx'; +import QuestionProfilePermissions from '/snippets/faq/question-profile-permissions.mdx'; + + + We recommend you install Elementary CLI using one of the following methods: - + ## Install from source @@ -33,7 +40,7 @@ Elementary CLI requires a connection profile to connect to DWH. Additional confi (These default paths and names may be changed using the [CLI options](/oss/cli-commands#cli-advanced-options)). - + @@ -44,8 +51,8 @@ If you are a dbt user, you already have a `profiles.yml` file that you can use. - - + + diff --git a/docs/oss/deployment-and-configuration/docker.mdx b/docs/oss/deployment-and-configuration/docker.mdx index f98d90702..778d88c7b 100644 --- a/docs/oss/deployment-and-configuration/docker.mdx +++ b/docs/oss/deployment-and-configuration/docker.mdx @@ -20,8 +20,8 @@ Install an image using the `docker pull` command: # For the latest version. docker pull ghcr.io/elementary-data/elementary:latest -# For a specific version (for instance, 0.6.3). -docker pull ghcr.io/elementary-data/elementary:v0.6.3 +# For a specific version (for instance, 0.20.0). +docker pull ghcr.io/elementary-data/elementary:v0.20.0 ``` ### Running Elementary Docker image in a container diff --git a/docs/oss/deployment-and-configuration/elementary-in-production.mdx b/docs/oss/deployment-and-configuration/elementary-in-production.mdx index 589804178..afaee4e80 100644 --- a/docs/oss/deployment-and-configuration/elementary-in-production.mdx +++ b/docs/oss/deployment-and-configuration/elementary-in-production.mdx @@ -2,88 +2,11 @@ title: "Elementary in production" --- - +import QuickstartElementaryProd from '/snippets/quickstart/quickstart-elementary-prod.mdx'; -Running Elementary OSS in production means to include the dbt package in your production dbt project, -and setting up an automated manner to run the Elementary CLI. -You can choose any system that allows you to orchestrate a CLI execution, as long as it can meet the following requirements: -- Full installation of Elementary Python package and dependencies. -- Network access to the data warehouse. -- Access to a `profiles.yml` file with `elementary` profile name. -- Read and write permissions to the Elementary schema. +**Running Elementary OSS in production means to include the dbt package in your production dbt project, +and setting up an automated manner to run the Elementary CLI.** -## Elementary `--env` flag - -When you run Elementary in production use the `--env prod` flag (By default, it is set to `dev`). -This will show the environment as `prod` in the report and alerts. - -## Elementary production deployment and jobs - -Your deployment of Elementary has two parts: - -**Part 1 - Elementary package in your production dbt project** - -In your dbt jobs, after you deploy the Elementary dbt package: - -1. **Run and test results** - Collected by default as part of your runs. -2. **Elementary tests** - Make sure your dbt test runs include elementary tests. -3. **Update dbt artifacts** - Elementary uses the dbt artifacts models to enrich the report, alerts and lineage. To update the artifacts data, run the elementary dbt artifacts models. - -**Part 2 - Elementary CLI** - -On an orchestration system of your choice, run the CLI to: - -1. **Send Slack alerts** - Use the `edr monitor --env prod` command and Slack integration. -2. **Generate a report** - Use the `edr report --env prod` or `edr send-report --env prod` that has built in support for sending the report to Slack / GCS / S3. - -## What permissions Elementary requires? - -The CLI needs to have permissions to access the `profiles.yml` file with the relevant profile, -and network access to the data warehouse. - -Also, in the `elementary` profile, the credentials should have permissions to: - -- Read all project schemas -- Write to the elementary schema - -On your dbt project, make sure that Elementary dbt package can: - -- Read all project schemas -- Write to the elementary schema -- Create a schema (alternatively, you can create the elementary schema in advance) - - - -## When to run Elementary? - -For sending alerts or generating a report, there are two options: - -1. **Run Elementary CLI after each relevant dbt job** (`dbt test` / `dbt build` / `dbt run`). -2. **Run Elementary CLI periodically** in a frequency that fits your dbt job executions. - -## Ways to run Elementary in production - -If your organization is using dbt-core, it would probably be a good choice to orchestrate Elementary using the same system that orchestrates dbt. - -Any automation server / orchestration tool supports running a CLI tool like Elementary. - -Some options include: - -1. [Elementary cloud](/cloud/introduction) -2. GitHub Actions - Checkout the [Elementary GitHub Action](https://github.com/elementary-data/run-elementary-action) we created. -3. [Docker](/oss/deployment-and-configuration/docker) -4. [GitLab CI](/oss/deployment-and-configuration/gitlab-ci) -5. Airflow -6. Prefect -7. Meltano - Checkout the [custom Elementary integration](https://hub.meltano.com/utilities/elementary) developed by [Stéphane Burwash](https://github.com/SBurwash) - [Potloc](https://www.potloc.com/). -8. Using cron - -## Need help? - -If you want to consult on production deployment, we would love to help! -Please reach out to us on [Slack](https://elementary-data.com/community), we could talk there or schedule a deployment video call. + diff --git a/docs/oss/deployment-and-configuration/slack.mdx b/docs/oss/deployment-and-configuration/slack.mdx index 18079bc65..5f859a698 100644 --- a/docs/oss/deployment-and-configuration/slack.mdx +++ b/docs/oss/deployment-and-configuration/slack.mdx @@ -3,6 +3,10 @@ title: "Slack setup for Elementary CLI" sidebarTitle: "Slack" --- +import SetupSlackIntegration from '/snippets/setup-slack-integration.mdx'; + + + Elementary Slack integration includes sending [Slack alerts](/oss/guides/alerts/send-slack-alerts) on failures in dbt tests and models, and the option to distribute the [data observability report](/oss/guides/generate-report-ui) as a message attachment. ## Token vs Webhook @@ -23,7 +27,7 @@ Below is features support comparison table, to help you select the integration m ## Slack integration setup - + --- diff --git a/docs/oss/deployment-and-configuration/teams.mdx b/docs/oss/deployment-and-configuration/teams.mdx index 85eff3cb9..de32fcb66 100644 --- a/docs/oss/deployment-and-configuration/teams.mdx +++ b/docs/oss/deployment-and-configuration/teams.mdx @@ -3,8 +3,14 @@ title: "Teams setup for Elementary CLI" sidebarTitle: "Teams" --- +import SetupTeamsIntegration from '/snippets/setup-teams-integration.mdx'; + + + Elementary Teams integration includes sending [Teams alerts](/oss/guides/alerts/send-teams-alerts) on failures in dbt tests and models. The alerts are sent using Microsoft Teams Adaptive Cards format, which provides rich formatting and interactive capabilities. +MS Teams supports Elementary Alerts, but unlike Slack, it does not support the Elementary report or multiple channels. + ## Integration options There are two ways to create a webhook for Microsoft Teams: @@ -12,15 +18,10 @@ There are two ways to create a webhook for Microsoft Teams: 1. **Microsoft Teams Connectors (Legacy)**: The traditional way of creating webhooks, but this method is being deprecated by Microsoft. 2. **Power Automate Workflows (Recommended)**: The newer, more flexible way to create webhooks. Note that when using this method, Elementary CLI cannot directly verify if messages were delivered - you'll need to monitor your workflow runs in Power Automate. -Below is a features support comparison table (with Slack), to help you select the integration method. -| Integration | Elementary alerts | Elementary report | Multiple channels | -| ------------------------ | ----------------- | ----------------- | ----------------- | -| Teams Connector (Legacy) | ✅ | ❌ | ❌ | -| Power Automate Workflows | ✅ | ❌ | ❌ | -**Note:** Microsoft 365 Connectors (previously called Office 365 Connectors) are nearing deprecation. We recommend using Power Automate Workflows for new integrations. +Microsoft 365 Connectors (previously called Office 365 Connectors) are nearing deprecation. We recommend using Power Automate Workflows for new integrations. ## Teams integration setup - + diff --git a/docs/oss/general/community-and-support.mdx b/docs/oss/general/community-and-support.mdx index ef76ef22d..dce2afcb6 100644 --- a/docs/oss/general/community-and-support.mdx +++ b/docs/oss/general/community-and-support.mdx @@ -2,4 +2,8 @@ title: "Community and Support" --- - +import SupportContact from '/snippets/support-contact.mdx'; + + + + diff --git a/docs/oss/general/faq.mdx b/docs/oss/general/faq.mdx index d4744258b..9ef95b5f7 100644 --- a/docs/oss/general/faq.mdx +++ b/docs/oss/general/faq.mdx @@ -3,18 +3,33 @@ title: "FAQ" description: "This section is aimed at collecting common questions users have to provide documented answers." --- - - - - - - - - - - - - +import QuestionDisableHooks from '/snippets/faq/question-disable-hooks.mdx'; +import QuestionFilterElementaryTests from '/snippets/faq/question-filter-elementary-tests.mdx'; +import QuestionDisableElementaryModels from '/snippets/faq/question-disable-elementary-models.mdx'; +import QuestionChangeElementarySchema from '/snippets/faq/question-change-elementary-schema.mdx'; +import QuestionSchema from '/snippets/faq/question-schema.mdx'; +import QuestionWhichTests from '/snippets/faq/question-which-tests.mdx'; +import QuestionSingularTestsConfig from '/snippets/faq/question-singular-tests-config.mdx'; +import QuestionFullRefresh from '/snippets/faq/question-full-refresh.mdx'; +import QuestionElementaryPermissions from '/snippets/faq/question-elementary-permissions.mdx'; +import QuestionDbtCloud from '/snippets/faq/question-dbt-cloud.mdx'; +import QuestionTestResultsSample from '/snippets/faq/question-test-results-sample.mdx'; +import QuestionCost from '/snippets/faq/question-cost.mdx'; + + + + + + + + + + + + + + + diff --git a/docs/oss/general/troubleshooting.mdx b/docs/oss/general/troubleshooting.mdx index a0b12c658..735fdec6f 100644 --- a/docs/oss/general/troubleshooting.mdx +++ b/docs/oss/general/troubleshooting.mdx @@ -3,6 +3,10 @@ title: "Troubleshooting" description: "This section is aimed at collecting common issues users have to provide quick debug solutions." --- +import Dbt18MaterializationsCommon from '/snippets/dbt-18-materializations-common.mdx'; + + + If you get an empty report, there are several steps to understand what went wrong and try and fix it. @@ -32,12 +36,22 @@ If you get an empty report, there are several steps to understand what went wron - Run the CLI with the flag for force updating the packages: `edr report -u true` -**3. Validate that the CLI has a working connection profile** +**3. Validate that the CLI has a working connection profile in the [right path and format](https://docs.elementary-data.com/oss/cli-install#how-to-create-profiles-yml)** -- **Check that the connection profile exists in the right path and format** -- **Check that the connection profile points to the elementary package schema** +- Default path: `~/.dbt/profiles.yml`. If saved elsewhere, make sure to run `dbt run --profiles-dir ` and `dbt test --profiles-dir ` +- Profile name: `elementary` +- Make sure that the elementary profile is a top-level profile, with the same indentation of the profiles you have already set up +- Schema name: The schema of the elementary models. The default name is  `_elementary` -**4. Still not working? Collect the following logs and reach our to the elementary team at [#support](https://elementary-data.com/community) on Slack** +**4. Validate the schema configuration for elementary models in your dbt_project.yml** +```yaml +models: + jaffle_shop: + +materialized: table + elementary: + +schema: 'elementary' +``` +**5. Still not working? Collect the following logs and reach out to the elementary team at [#community-support](https://elementary-data.com/community) on Slack** - **edr.log** - Created on the execution folder of the CLI. - **dbt.log** - Created under the package location at @@ -154,14 +168,29 @@ python3 -m pip install elementary-data[] - + + + If you are encountering the warning above though, it means that you have previously added the flag + `require_explicit_package_overrides_for_builtin_materializations=False` to `dbt-project.yml`. + This is no longer required!. + + Instead, please add a file named `elementary_materialization.sql` to your macros folder, with the following contents - + + If you use Snowflake: + ```sql + {% materialization test, adapter='snowflake' %} + {{ return(elementary.materialization_test_snowflake()) }} + {% endmaterialization %} + ``` - The warning above may appear in one of the following two cases: - * If you are using the most recent version of dbt 1.6 or 1.7 - this warning will appear by default, since it indicates the aforementioned behavior change in dbt 1.8. - * If you are using dbt 1.8 and above, this warning will NOT appear by default, however it will start appearing once you set the flag `require_explicit_package_overrides_for_builtin_materializations` to `false` - as required in the [dbt package installation guide](/oss/quickstart/quickstart-cli-package#step-by-step-install-elementary-dbt-package). + If you use any other DWH: + ```sql + {% materialization test, default %} + {{ return(elementary.materialization_test_default()) }} + {% endmaterialization %} + ``` - In either case, please ignore it for now. This is a temporary measure and we are working with the dbt team on a [longer term solution](https://github.com/dbt-labs/dbt-core/issues/10090). + This will ensure Elementary's test materialization is run but will avoid the warning. @@ -179,26 +208,52 @@ If you want the Elementary UI to show data for a longer period of time, use the - - -If you want to prevent elementary tests from running the simplest way is to exclude the tag that marks all of them in your dbt command: -```shell -dbt test --exclude tag:elementary-tests + + + +When writing to the `dbt_artifacts` tables in the Elementary schema, data is deleted and reinserted. Running parallel jobs through an orchestrator can lead to errors, as multiple jobs may attempt to modify the same tables simultaneously. +To prevent this, you should: +1. [Disable the on-run-end hooks](/oss/general/faq#can-i-disable-the-on-run-end-hooks-or-results-uploading) +2. [Exclude the Elementary models](/oss/general/faq#can-i-disable-exclude-the-elementary-models) + +For scheduled updates to `dbt_artifacts` (e.g., a daily job), run: + +``` +dbt run --select elementary --vars '{"enable_elementary_models": true}' ``` -If you add the following to your dbt_project.yml, elementary models will not run and elementary tests will be executed but will do nothing and always pass. -```shell -models: - elementary: - enabled: false + + + + + +The `dbt_columns` table in the Elementary schema can take a while to update, especially for large projects. This table is only used by Elementary Cloud, so if you're not relying on it (or want to speed up your runs), you can disable it safely without affecting any other functionality. + +To skip updating this table, add the following to your dbt_project.yml: + ``` +vars: + columns_upload_strategy: none + +``` + +For more comprehensive strategies to reduce on-run-end hooks time, see the [Reduce On-Run-End Hooks Time guide](/oss/guides/reduce-on-run-end-time). + + + + + +dbt-fusion support in Elementary is still in beta. + +While most of the core features should work, some features may not work as expected. +For more details, please click [here](/oss/integrations/dbt-fusion) -If you're experiencing issues of any kind, please contact us on the [#support](https://elementary-community.slack.com/archives/C02CTC89LAX) channel. +If you're experiencing issues of any kind, reach out on the [#community-support](https://elementary-community.slack.com/archives/C02CTC89LAX) channel. Elementary AI and the team will be happy to help. diff --git a/docs/oss/guides/alerts/alerts-configuration.mdx b/docs/oss/guides/alerts/alerts-configuration.mdx index dcc4d08cb..7d9a700cd 100644 --- a/docs/oss/guides/alerts/alerts-configuration.mdx +++ b/docs/oss/guides/alerts/alerts-configuration.mdx @@ -3,19 +3,30 @@ title: "Alerts Configuration and Customization" sidebarTitle: "Alerts configuration" --- - +import AlertsCodeConfiguration from '/snippets/guides/alerts-code-configuration.mdx'; + + + + ## Alerts CLI flags + + Alert vars are deprecated! We recommend filtering the alerts + using CLI selectors instead. + + + + #### Filter alerts Elementary supports filtering the alerts by tag, owner, model, status or resource type. Using filters, you can send alerts to the relevant people and teams by running `edr` multiple times with different filters on each run. - -alerts on skipped tests and models are filtered out by default. if you want to receive those alerts, apply the statuses filter and include them explicitly. - + +Alerts on skipped tests and models are filtered out by default. if you want to receive those alerts, apply the statuses filter and include them explicitly. + @@ -87,3 +98,11 @@ If configured otherwise in the dbt project config block or meta, the CLI value w ```shell edr monitor --suppression-interval 24 ``` + +#### Alert group threshold + +Set a minimum alert threshold before grouping notifications into a summary alert — keeping noise low while ensuring nothing gets missed. + +```shell +edr monitor --group-alerts-threshold 5 +``` \ No newline at end of file diff --git a/docs/oss/guides/alerts/elementary-alerts.mdx b/docs/oss/guides/alerts/elementary-alerts.mdx index 1f4c3f5d5..469313bb5 100644 --- a/docs/oss/guides/alerts/elementary-alerts.mdx +++ b/docs/oss/guides/alerts/elementary-alerts.mdx @@ -2,11 +2,15 @@ title: "Elementary alerts" --- +import AlertsIntroduction from '/snippets/alerts/alerts-introduction.mdx'; + + + More alerts integrations are coming soon, reach out to us for details! - + @@ -14,7 +18,7 @@ This is **required for the alerts to work.**
- + ## Execute the CLI @@ -34,13 +38,7 @@ be sent to the wrong one due to the overlap accessing the backend table of eleme ## Alert on source freshness failures -_Not supported in dbt cloud_ - -To alert on source freshness, you will need to run `edr run-operation upload-source-freshness` right after each execution of `dbt source freshness`. -This operation will upload the results to a table, and the execution of `edr monitor` will send the actual alert. - -- Note that `dbt source freshness` and `upload-source-freshness` needs to run from the same machine. -- Note that `upload-source-freshness` requires passing `--project-dir` argument. +To alert on source freshness, follow [this guide](/oss/guides/collect-dbt-source-freshness). ## Continuous alerting diff --git a/docs/oss/guides/alerts/send-teams-alerts.mdx b/docs/oss/guides/alerts/send-teams-alerts.mdx index fe2665983..988035cf3 100644 --- a/docs/oss/guides/alerts/send-teams-alerts.mdx +++ b/docs/oss/guides/alerts/send-teams-alerts.mdx @@ -2,6 +2,10 @@ title: "Setup Teams alerts" --- +import SetupTeamsIntegration from '/snippets/setup-teams-integration.mdx'; + + + **Before you start** @@ -14,7 +18,7 @@ Elementary sends alerts using Microsoft Teams Adaptive Cards format, which provi
- + ## Execute the CLI diff --git a/docs/oss/guides/collect-dbt-source-freshness.mdx b/docs/oss/guides/collect-dbt-source-freshness.mdx index 131a1f277..f9863c442 100644 --- a/docs/oss/guides/collect-dbt-source-freshness.mdx +++ b/docs/oss/guides/collect-dbt-source-freshness.mdx @@ -3,4 +3,8 @@ title: "Collect dbt source freshness results" sidebarTitle: "dbt source freshness" --- - \ No newline at end of file +import DbtSourceFreshness from '/snippets/guides/dbt-source-freshness.mdx'; + + + + \ No newline at end of file diff --git a/docs/oss/guides/collect-job-data.mdx b/docs/oss/guides/collect-job-data.mdx index 72361e2b1..923a8d648 100644 --- a/docs/oss/guides/collect-job-data.mdx +++ b/docs/oss/guides/collect-job-data.mdx @@ -3,4 +3,8 @@ title: "Collect jobs info from orchestrator" sidebarTitle: "Collect jobs data" --- - +import CollectJobData from '/snippets/guides/collect-job-data.mdx'; + + + + diff --git a/docs/oss/guides/generate-report-ui.mdx b/docs/oss/guides/generate-report-ui.mdx index b251ae0fe..3040c64ab 100644 --- a/docs/oss/guides/generate-report-ui.mdx +++ b/docs/oss/guides/generate-report-ui.mdx @@ -2,7 +2,14 @@ title: "Generate observability report" --- -Elementary [data observability report](/features/data-observability-dashboard) can be used for visualization and exploration of data from the dbt-package tables. That includes dbt test results, Elementary anomaly detection results, dbt artifacts, tests runs, etc. +import InstallDbtPackage from '/snippets/install-dbt-package.mdx'; +import AddConnectionProfile from '/snippets/add-connection-profile.mdx'; +import InstallCli from '/snippets/install-cli.mdx'; +import ShareReport from '/snippets/share-report.mdx'; + + + +Elementary [data observability report](cloud/features/collaboration-and-communication/data-observability-dashboard) can be used for visualization and exploration of data from the dbt-package tables. That includes dbt test results, Elementary anomaly detection results, dbt artifacts, tests runs, etc. - +
- + - + @@ -69,4 +76,4 @@ edr report --select invocation_id:XXXXXXXXXXXXX --- - + diff --git a/docs/oss/guides/performance-alerts.mdx b/docs/oss/guides/performance-alerts.mdx new file mode 100644 index 000000000..d9a26f284 --- /dev/null +++ b/docs/oss/guides/performance-alerts.mdx @@ -0,0 +1,9 @@ +--- +title: "Performance Alerts" +sidebarTitle: "Performance alerts" +--- + +import PerformanceAlerts from '/snippets/guides/performance-alerts.mdx'; + + + diff --git a/docs/oss/guides/reduce-on-run-end-time.mdx b/docs/oss/guides/reduce-on-run-end-time.mdx new file mode 100644 index 000000000..c384b7c80 --- /dev/null +++ b/docs/oss/guides/reduce-on-run-end-time.mdx @@ -0,0 +1,9 @@ +--- +title: "Reduce On-Run-End Hooks Time" +sidebarTitle: "Reduce on-run-end time" +--- + +import ReduceOnRunEndTime from '/snippets/guides/reduce-on-run-end-time.mdx'; + + + diff --git a/docs/oss/integrations/athena.mdx b/docs/oss/integrations/athena.mdx new file mode 100644 index 000000000..b015c3517 --- /dev/null +++ b/docs/oss/integrations/athena.mdx @@ -0,0 +1,19 @@ +--- +title: "Athena" +--- + +import AthenaCli from '/snippets/cli/athena-cli.mdx'; + + + + + + + + + +### Have a question? + +We are available +on [Slack](https://elementary-data.com/community), reach out +for any kind of help! diff --git a/docs/oss/integrations/bigquery.mdx b/docs/oss/integrations/bigquery.mdx index 1674db61c..915ad3b60 100644 --- a/docs/oss/integrations/bigquery.mdx +++ b/docs/oss/integrations/bigquery.mdx @@ -2,12 +2,17 @@ title: "BigQuery" --- +import Bigquery from '/snippets/cloud/integrations/bigquery.mdx'; +import BigqueryCli from '/snippets/cli/bigquery-cli.mdx'; + + + - + - + diff --git a/docs/oss/integrations/clickhouse.mdx b/docs/oss/integrations/clickhouse.mdx new file mode 100644 index 000000000..c40ea112a --- /dev/null +++ b/docs/oss/integrations/clickhouse.mdx @@ -0,0 +1,19 @@ +--- +title: "Clickhouse" +--- + +import ClickhouseCli from '/snippets/cli/clickhouse-cli.mdx'; + + + + + + + + + +### Have a question? + +We are available +on [Slack](https://elementary-data.com/community), reach out +for any kind of help! diff --git a/docs/oss/integrations/databricks.mdx b/docs/oss/integrations/databricks.mdx index ae817d004..a42621996 100644 --- a/docs/oss/integrations/databricks.mdx +++ b/docs/oss/integrations/databricks.mdx @@ -2,12 +2,17 @@ title: "Databricks" --- +import Databricks from '/snippets/cloud/integrations/databricks.mdx'; +import DatabricksCli from '/snippets/cli/databricks-cli.mdx'; + + + - + - + diff --git a/docs/oss/integrations/dbt-fusion.mdx b/docs/oss/integrations/dbt-fusion.mdx new file mode 100644 index 000000000..47cd36590 --- /dev/null +++ b/docs/oss/integrations/dbt-fusion.mdx @@ -0,0 +1,7 @@ +--- +title: "dbt Fusion (Beta)" +--- + +import DbtFusion from '/snippets/integrations/dbt-fusion.mdx'; + + diff --git a/docs/oss/integrations/dbt.mdx b/docs/oss/integrations/dbt.mdx index 01faf86cb..ae9ed063b 100644 --- a/docs/oss/integrations/dbt.mdx +++ b/docs/oss/integrations/dbt.mdx @@ -2,6 +2,11 @@ title: "dbt core & dbt cloud" --- +import QuestionConnectionProfile from '/snippets/faq/question-connection-profile.mdx'; +import QuestionProfilePermissions from '/snippets/faq/question-profile-permissions.mdx'; + + + Elementary OSS integrates with dbt core (1.3.0 and above) and dbt cloud, as long as the data warehouse is supported. Both dbt core and cloud users need to [deploy the dbt package](/oss/quickstart/quickstart-cli-package) first in the monitored project. @@ -26,5 +31,5 @@ Here are the detailed steps for using Elementary on dbt cloud: Use to create the profile, [this guide](/oss/quickstart/quickstart-cli) install the CLI, and run it. - - + + diff --git a/docs/oss/integrations/dremio.mdx b/docs/oss/integrations/dremio.mdx new file mode 100644 index 000000000..619452321 --- /dev/null +++ b/docs/oss/integrations/dremio.mdx @@ -0,0 +1,21 @@ +--- +title: "Dremio" +--- + +import Dremio from '/snippets/cloud/integrations/dremio.mdx'; +import DremioCli from '/snippets/cli/dremio-cli.mdx'; + + + + + + + + + + + + +### Have a question? + +We are available on [Slack](https://elementary-data.com/community), reach out for any kind of help! diff --git a/docs/oss/integrations/duckdb.mdx b/docs/oss/integrations/duckdb.mdx new file mode 100644 index 000000000..97ccdb2cd --- /dev/null +++ b/docs/oss/integrations/duckdb.mdx @@ -0,0 +1,19 @@ +--- +title: "DuckDB" +--- + +import DuckdbCli from '/snippets/cli/duckdb-cli.mdx'; + + + + + + + + + +### Have a question? + +We are available +on [Slack](https://elementary-data.com/community), reach out +for any kind of help! diff --git a/docs/oss/integrations/fabric.mdx b/docs/oss/integrations/fabric.mdx new file mode 100644 index 000000000..0fabaf72a --- /dev/null +++ b/docs/oss/integrations/fabric.mdx @@ -0,0 +1,19 @@ +--- +title: "Fabric" +--- + +import FabricCli from '/snippets/cli/fabric-cli.mdx'; + + + + + + + + + +### Have a question? + +We are available +on [Slack](https://elementary-data.com/community), reach out +for any kind of help! diff --git a/docs/oss/integrations/postgres.mdx b/docs/oss/integrations/postgres.mdx index 6e60f6639..e0dbff126 100644 --- a/docs/oss/integrations/postgres.mdx +++ b/docs/oss/integrations/postgres.mdx @@ -2,12 +2,17 @@ title: "Postgres" --- +import Postgres from '/snippets/cloud/integrations/postgres.mdx'; +import PostgresCli from '/snippets/cli/postgres-cli.mdx'; + + + - + - + diff --git a/docs/oss/integrations/redshift.mdx b/docs/oss/integrations/redshift.mdx index 4dacbc4fa..67f82ac9b 100644 --- a/docs/oss/integrations/redshift.mdx +++ b/docs/oss/integrations/redshift.mdx @@ -2,12 +2,17 @@ title: "Redshift" --- +import Redshift from '/snippets/cloud/integrations/redshift.mdx'; +import RedshiftCli from '/snippets/cli/redshift-cli.mdx'; + + + - + - + diff --git a/docs/oss/integrations/snowflake.mdx b/docs/oss/integrations/snowflake.mdx index dc1419c0a..6f5763f1b 100644 --- a/docs/oss/integrations/snowflake.mdx +++ b/docs/oss/integrations/snowflake.mdx @@ -2,12 +2,17 @@ title: "Snowflake" --- +import Snowflake from '/snippets/cloud/integrations/snowflake.mdx'; +import SnowflakeCli from '/snippets/cli/snowflake-cli.mdx'; + + + - + - + diff --git a/docs/oss/integrations/spark.mdx b/docs/oss/integrations/spark.mdx new file mode 100644 index 000000000..f8e09dcd9 --- /dev/null +++ b/docs/oss/integrations/spark.mdx @@ -0,0 +1,19 @@ +--- +title: "Spark" +--- + +import SparkCli from '/snippets/cli/spark-cli.mdx'; + + + + + + + + + +### Have a question? + +We are available +on [Slack](https://elementary-data.com/community), reach out +for any kind of help! diff --git a/docs/oss/integrations/sqlserver.mdx b/docs/oss/integrations/sqlserver.mdx new file mode 100644 index 000000000..a836b7070 --- /dev/null +++ b/docs/oss/integrations/sqlserver.mdx @@ -0,0 +1,19 @@ +--- +title: "SQL Server" +--- + +import SqlserverCli from '/snippets/cli/sqlserver-cli.mdx'; + + + + + + + + + +### Have a question? + +We are available +on [Slack](https://elementary-data.com/community), reach out +for any kind of help! diff --git a/docs/oss/integrations/trino.mdx b/docs/oss/integrations/trino.mdx new file mode 100644 index 000000000..5ff62e7c6 --- /dev/null +++ b/docs/oss/integrations/trino.mdx @@ -0,0 +1,19 @@ +--- +title: "Trino" +--- + +import TrinoCli from '/snippets/cli/trino-cli.mdx'; + + + + + + + + + +### Have a question? + +We are available +on [Slack](https://elementary-data.com/community), reach out +for any kind of help! diff --git a/docs/oss/integrations/vertica.mdx b/docs/oss/integrations/vertica.mdx new file mode 100644 index 000000000..6b522e4fb --- /dev/null +++ b/docs/oss/integrations/vertica.mdx @@ -0,0 +1,19 @@ +--- +title: "Vertica" +--- + +import VerticaCli from '/snippets/cli/vertica-cli.mdx'; + + + + + + + + + +### Have a question? + +We are available +on [Slack](https://elementary-data.com/community), reach out +for any kind of help! diff --git a/docs/oss/oss-introduction.mdx b/docs/oss/oss-introduction.mdx index 6d2d31b4e..69b73e818 100644 --- a/docs/oss/oss-introduction.mdx +++ b/docs/oss/oss-introduction.mdx @@ -4,26 +4,46 @@ sidebarTitle: "Introduction" icon: "square-terminal" --- -Elementary OSS is a CLI tool you can deploy on top of the Elementary dbt package and orchestrate to send Slack alerts and self-host the Elementary report. +import AdaptersCards from '/snippets/oss/adapters-cards.mdx'; -To gain the most value from the Elementary dbt package, we recommend using it with the Elementary Cloud Platform. +**Elementary OSS is a CLI tool that works alongside the [Elementary dbt package](/data-tests/dbt/dbt-package).** With the dbt package, you gain powerful anomaly detection tests and metadata tables to monitor data quality trends over time. -The difference between the CLI and the Elementary Cloud Platform +By deploying the CLI, you can **send alerts and self-host the Elementary data observability report**, a comprehensive view of your dbt runs and all dbt test results, helping you track data lineage, test coverage, and overall pipeline health. + +For a scalable and collaborative data quality platform, **look into the [Elementary Cloud Platform](/cloud/introduction)** - A fully managed, enterprise-ready solution with ML-powered anomaly detection, flexible data discovery, integrated incident management, and collaboration tools—all with minimal setup and infrastructure maintenance. See here a [detailed features comparison](cloud/cloud-vs-oss). + +### What can you do with Elementary OSS? + + + + + + + + + + + + + + +
+ + + Demo + + +### Supported adapters + + -- **The Elementary Cloud Platform - for** **organizational data quality initiatives** (e.g., platform refactoring, governance projects, or AI/ML adoption). -The Elementary Cloud Platform offers pipeline monitoring, incident management, lineage tracking, dashboards, health scores, and alerts. It empowers data engineers, analytics engineers, and data analysts to collaborate, resolve issues efficiently, and deliver trusted data products. Learn more about the [Elementary Cloud Platform](https://docs.elementary-data.com/cloud/introduction) or [book a demo](https://cal.com/maayansa/elementary-intro-docs). -- **The CLI tool - for individual use** -The CLI tool is used to detect issues, send Slack alerts, and self-host the Elementary report. It is the right choice if you’ll be the primary user of Elementary. Learn more about the CLI tool below. - + Install the Elementary dbt package and CLI tool. - - + + Watch the webinar to learn how to get started with the Elementary dbt package and CLI tool. + + diff --git a/docs/oss/quickstart/quickstart-alerts.mdx b/docs/oss/quickstart/quickstart-alerts.mdx new file mode 100644 index 000000000..6a85c99bd --- /dev/null +++ b/docs/oss/quickstart/quickstart-alerts.mdx @@ -0,0 +1,47 @@ +--- +title: "Quickstart: Setup alerts" +sidebarTitle: "Setup alerts" +icon: "square-5" +--- + +import AlertsIntroduction from '/snippets/alerts/alerts-introduction.mdx'; + + + +**Elementary helps you stay on top of data issues by generating Slack and Microsoft Teams alerts for test failures, anomalies, and other data reliability signals.** + + + + + +
+ +
+ +
+ +
+
+ +## Alert configuration + +Elementary alerts can be enriched with metadata such as owners, tags, and descriptions, and offer flexible configuration options to fit your team's needs. Learn more about alert configuration [here](/oss/guides/alerts/alerts-configuration). + +To alert on source freshness, see [this guide](/oss/guides/collect-dbt-source-freshness). + diff --git a/docs/oss/quickstart/quickstart-cli-package.mdx b/docs/oss/quickstart/quickstart-cli-package.mdx index dcab3bbe8..67082d2e9 100644 --- a/docs/oss/quickstart/quickstart-cli-package.mdx +++ b/docs/oss/quickstart/quickstart-cli-package.mdx @@ -1,17 +1,13 @@ --- -title: "Install Elementary dbt package" +title: "Quickstart: Install Elementary dbt package" sidebarTitle: "Install dbt package" icon: "square-1" --- - +import QuickstartPackageInstall from '/snippets/quickstart-package-install.mdx'; -## What's next? + -Take a moment to ⭐️ [star our Github repo!](https://github.com/elementary-data/elementary) ⭐️ (It helps us a lot) - -Then - +## What's next? -1. [Install the Elementary CLI](/oss/quickstart/quickstart-cli) 🤩 -2. [Add data anomaly detection dbt tests](/data-tests/add-elementary-tests) 📈 -3. [Deploy Elementary in production](/oss/deployment-and-configuration/elementary-in-production) 🚀 +Take a moment to ⭐️ [star our Github repo!](https://github.com/elementary-data/elementary) ⭐️ (It helps us a lot) diff --git a/docs/oss/quickstart/quickstart-cli.mdx b/docs/oss/quickstart/quickstart-cli.mdx index 7e72452ba..4ae10514c 100644 --- a/docs/oss/quickstart/quickstart-cli.mdx +++ b/docs/oss/quickstart/quickstart-cli.mdx @@ -4,26 +4,25 @@ sidebarTitle: "Install Elementary CLI" icon: "square-2" --- -Before installing the CLI, make sure to complete the steps dbt package installation, including executing `dbt run` with the Elementary package models. - - +import InstallDbtPackage from '/snippets/install-dbt-package.mdx'; +import AddConnectionProfile from '/snippets/add-connection-profile.mdx'; +import QuestionConnectionProfile from '/snippets/faq/question-connection-profile.mdx'; +import QuestionProfilePermissions from '/snippets/faq/question-profile-permissions.mdx'; +import InstallCli from '/snippets/install-cli.mdx'; - - + Elementary supports Python versions 3.9 - 3.12, aligning with the [versions supported by dbt](https://docs.getdbt.com/faqs/Core/install-python-compatibility#python-compatibility-matrix). - +Before installing the CLI, make sure to complete the steps dbt package installation, including executing `dbt run` with the Elementary package models. - + + + - + - + -## What's next? + -1. Use the CLI to: - - [Visualize all dbt test results and runs in a report](/oss/guides/generate-report-ui) ✨ - - [Send informative alerts on failures](/oss/guides/alerts/elementary-alerts) 📣 -2. [Add data anomaly detection dbt tests](/data-tests/add-elementary-tests) 📈 -3. [Deploy Elementary in production](/oss/deployment-and-configuration/elementary-in-production) 🚀 + diff --git a/docs/oss/quickstart/quickstart-prod.mdx b/docs/oss/quickstart/quickstart-prod.mdx new file mode 100644 index 000000000..eeb01432e --- /dev/null +++ b/docs/oss/quickstart/quickstart-prod.mdx @@ -0,0 +1,13 @@ +--- +title: "Quickstart: Deploy in Production" +sidebarTitle: "Deploy in production" +icon: "square-6" +--- + +import QuickstartElementaryProd from '/snippets/quickstart/quickstart-elementary-prod.mdx'; + + +**Running Elementary OSS in production means to include the dbt package in your production dbt project, +and setting up an automated manner to run the Elementary CLI.** This step is optional, but it will help you automate, control and scale Elementary OSS. + + diff --git a/docs/oss/quickstart/quickstart-report.mdx b/docs/oss/quickstart/quickstart-report.mdx new file mode 100644 index 000000000..f6a5941a6 --- /dev/null +++ b/docs/oss/quickstart/quickstart-report.mdx @@ -0,0 +1,57 @@ +--- +title: "Quickstart: Elementary Observability Report" +sidebarTitle: "Generate observability report" +icon: "square-4" +--- + +import ShareReport from '/snippets/share-report.mdx'; + + + + +**Elementary [data observability report](cloud/features/collaboration-and-communication/data-observability-dashboard) can be used for visualization and exploration of data from the dbt-package tables. That includes dbt test results, Elementary anomaly detection results, dbt artifacts, tests runs, etc.** + + + Demo + + + +## Generate Tests Report UI + +### Execute `edr report` in your terminal + +After installing and configuring the CLI, execute the command: + +```shell +edr report +``` + +The command will use the provided connection profile to access the data warehouse, read from the Elementary tables, and generate the report as an HTML file. + +--- + +## Generating a report for single invocation + +Elementary support filtering the report by invocation on generation. +The filtered report will only include data for the selected invocation (This applies only on the `Test Results` and `Lineage` screens). + +There are 3 ways to filter the report by invocation: + +- **Last invocation** - Filter by the last invocation (`dbt test` / `dbt build`) that ran. +- **Invocation time** - Filter by the closest invocation (`dbt test` / `dbt build`) to the provided time (the provided time should be in ISO format local time). +- **Invocation id** - Filter by the provided invocation (`dbt test` / `dbt build`) id. + +Filters usage example: + +```shell edr report invocation filters +edr report --select last_invocation +edr report --select "invocation_time:2022-12-25 10:10:35" +edr report --select invocation_id:XXXXXXXXXXXXX +``` + +--- + + diff --git a/docs/oss/quickstart/quickstart-support.mdx b/docs/oss/quickstart/quickstart-support.mdx new file mode 100644 index 000000000..e43101849 --- /dev/null +++ b/docs/oss/quickstart/quickstart-support.mdx @@ -0,0 +1,20 @@ +--- +title: "Quickstart: Get support" +sidebarTitle: "Get support" +icon: "square-8" +--- + +**Join our [Slack community](https://elementary-data.com/community) to get support, stay updated, and connect with other Elementary users.** + +Our Slack community is a great place to: +- Stay updated on releases +- Get support from our AI, team, and thousands of peers +- Connect with other Elementary users +- Learn about use cases and what's going on in Elementary + +## Want to contribute? + +Want to make a contribution for an integration you're missing or any new feature or fix? We appreciate it! + +- Reach out in [Slack](https://elementary-data.com/community) or email us at support@elementary-data.com +- Learn more about contributing [here](/oss/general/contributions) diff --git a/docs/oss/quickstart/quickstart-tests.mdx b/docs/oss/quickstart/quickstart-tests.mdx new file mode 100644 index 000000000..c2790940e --- /dev/null +++ b/docs/oss/quickstart/quickstart-tests.mdx @@ -0,0 +1,137 @@ +--- +title: "Quickstart: Elementary Tests" +sidebarTitle: "Add Elementary tests" +icon: "square-3" +--- + +Once you've set up the basic Elementary components — the dbt package and CLI — you can start adding Elementary tests to your data. + +You can add Elementary tests once completing the initial setup or later as you identify critical tables and metrics. + +Many teams start with a few volume or freshness tests, then expand coverage as needed. + +Elementary data tests include flexible anomaly detection tests, schema changes and Python tests, configured and executed like native tests in your dbt project. Elementary tests can be used in addition to dbt tests, packages tests (such as dbt-expectations), and custom tests. All test results will be presented in the Elementary UI and alerts. + + + + + - Volume + - Freshness + - Event freshness + - Column anomalies + - Dimensions + + + - Schema changes + - Baseline schema + - JSON schema + - Exposure schema + + + - Python tests + + + +## Anomaly detection tests + +Tests to detect anomalies in data quality metrics such as volume, freshness, null rates, and anomalies in specific dimensions. + + + Demo + + + + [How Elementary anomaly detection tests + work?](/data-tests/how-anomaly-detection-works) + + + + Monitors table row count over time to detect drops or spikes in volume. + + + Monitors the latest timestamp of a table to detect data delays. + + + + Monitors the gap between the latest event timestamp and its loading time, to + detect event freshness issues. + + + + Monitors the row count per dimension over time, and alerts on unexpected + changes in the distribution. It is best to configure it on low-cardinality + fields. + + + + Monitors a column for anomalies in metrics such as null rate, length, max and + min, and more. Read more about [specific column + metrics](/data-tests/anomaly-detection-configuration/column-anomalies). + + + + Activates the column anomalies test on all the columns of the table. It's + possible to exclude specific columns. + + +## Schema tests + + + Fails on changes in schema: deleted or added columns, or change of data type + of a column. + + + + Fails if the table schema is different in columns names or column types than a + configured baseline (can be generated with a macro). + + + + Monitors a JSON type column and fails if there are JSON events that don't + match a configured JSON schema (can be generated with a macro). + + + + Monitors changes in your models' columns that break schema for downstream + exposures, such as BI dashboards. + + +## Other tests + + + Write your own custom tests using Python scripts. + diff --git a/docs/oss/quickstart/stay-updated.mdx b/docs/oss/quickstart/stay-updated.mdx new file mode 100644 index 000000000..caa8d69f6 --- /dev/null +++ b/docs/oss/quickstart/stay-updated.mdx @@ -0,0 +1,165 @@ +--- +title: "Stay updated" +sidebarTitle: "Stay updated" +icon: "square-5" +--- + +import { useState } from 'react'; + +export const NewsletterSignup = () => { + const [email, setEmail] = useState(''); + const [status, setStatus] = useState('idle'); + + const HUBSPOT_PORTAL_ID = '142608385'; + const HUBSPOT_FORM_ID = '4734860b-68fb-4f7f-aada-afb14e61afe7'; + + const handleSubmit = async (e) => { + e.preventDefault(); + if (!email) return; + setStatus('loading'); + + try { + const res = await fetch( + `https://api.hsforms.com/submissions/v3/integration/submit/${HUBSPOT_PORTAL_ID}/${HUBSPOT_FORM_ID}`, + { + method: 'POST', + headers: { 'Content-Type': 'application/json' }, + body: JSON.stringify({ + fields: [{ name: 'email', value: email }], + context: { + pageUri: typeof window !== 'undefined' ? window.location.href : '', + pageName: 'Elementary Docs - Stay Updated', + }, + }), + } + ); + if (res.ok) { + setStatus('success'); + setEmail(''); + } else { + setStatus('error'); + } + } catch { + setStatus('error'); + } + }; + + if (status === 'success') { + return ( +
+ You're in! We'll keep you posted. +
+ ); + } + + return ( + <> + +
+ setEmail(e.target.value)} + placeholder="you@company.com" + required + className="newsletter-email-input" + style={{ + flex: '1 1 260px', + padding: '10px 14px', + borderRadius: '8px', + border: '1px solid', + fontSize: '14px', + lineHeight: '1.5', + outline: 'none', + minWidth: '200px', + }} + /> + + {status === 'error' && ( +

+ Something went wrong — try again or join Slack instead. +

+ )} +
+ + ); +} + +Elementary moves fast — stay in the loop on new releases, breaking changes, and tips from the team and community. + +## Join the community + + + + 4,000+ data and analytics engineers + + + +## Subscribe for updates + +Subscribe for release notes, breaking change alerts, and best practices from teams running Elementary at scale. + + + + + +Take a moment to [star us on GitHub](https://github.com/elementary-data/elementary) — it helps other data teams discover Elementary. + + diff --git a/docs/oss/release-notes/releases/0.16.0.mdx b/docs/oss/release-notes/releases/0.16.0.mdx new file mode 100644 index 000000000..a31f54695 --- /dev/null +++ b/docs/oss/release-notes/releases/0.16.0.mdx @@ -0,0 +1,9 @@ +--- +title: "Elementary 0.16.0" +sidebarTitle: "0.16.0" +--- + +This update includes new features and improvements. + +Check out the full details here: https://github.com/elementary-data/elementary/releases/tag/v0.16.0 + diff --git a/docs/oss/release-notes/releases/0.16.1.mdx b/docs/oss/release-notes/releases/0.16.1.mdx new file mode 100644 index 000000000..b920d99db --- /dev/null +++ b/docs/oss/release-notes/releases/0.16.1.mdx @@ -0,0 +1,9 @@ +--- +title: "Elementary 0.16.1" +sidebarTitle: "0.16.1" +--- + +This update includes bug fixes and improvements. + +Check out the full details here: https://github.com/elementary-data/elementary/releases/tag/v0.16.1 + diff --git a/docs/oss/release-notes/releases/0.16.2.mdx b/docs/oss/release-notes/releases/0.16.2.mdx new file mode 100644 index 000000000..d6c741f74 --- /dev/null +++ b/docs/oss/release-notes/releases/0.16.2.mdx @@ -0,0 +1,20 @@ +--- +title: "Elementary 0.16.2" +sidebarTitle: "0.16.2" +--- + +🚀 **Elementary v0.16.2 is Here!** + +We’re happy to share the latest update to our **open-source package**, bringing **smarter alerts, improved reporting, and enhanced infrastructure** to help **small data teams** + +monitor their data more effectively. + +**What’s New?** + +📊 **Report Enhancements** – Improved navigation, lineage visualization, and full test visibility. + +🚨**Smarter Alerts** – Grouped notifications, **Microsoft Teams Adaptive Cards**, and **seed alerts**. + +🛠️**General Improvements** – DBT package lock files, **better error logging**, and **Google Cloud Storage updates.** + +Check out the full details here: https://github.com/elementary-data/elementary/releases/tag/v0.16.2 . \ No newline at end of file diff --git a/docs/oss/release-notes/releases/0.17.0.mdx b/docs/oss/release-notes/releases/0.17.0.mdx new file mode 100644 index 000000000..e0217c9f2 --- /dev/null +++ b/docs/oss/release-notes/releases/0.17.0.mdx @@ -0,0 +1,12 @@ +--- +title: "Elementary 0.17.0" +sidebarTitle: "0.17.0" +--- + +This update includes: + +- 🔧 UI bug fixes in the CLI +- 🔔 Some improvements to alerts +- 🔄 Version alignment between the CLI and dbt package + +Nothing major—just some refinements to keep things running smoothly. Check out the full details here: https://github.com/elementary-data/elementary/releases/tag/v0.17.0 \ No newline at end of file diff --git a/docs/oss/release-notes/releases/0.18.0.mdx b/docs/oss/release-notes/releases/0.18.0.mdx new file mode 100644 index 000000000..9b65f3027 --- /dev/null +++ b/docs/oss/release-notes/releases/0.18.0.mdx @@ -0,0 +1,14 @@ +--- +title: "Python v0.18.0" +sidebarTitle: "0.18.0" +--- + +## What's Changed + +- Invocation Filter Fix: Invocation filters now apply to both reports and their summaries, ensuring consistent filtering.​ +- Python Version Support: Official support for Python 3.8 has been discontinued to align with dbt's supported versions.​ +- Test Description Bug: Fixed an issue where test descriptions were missing in alerts when using dbt version 1.9. + +Note: We've bumped the minor version to align with the recent minor version update in the dbt package. + +Full Changelog: [v0.17.0...v0.18.0](https://github.com/elementary-data/elementary/compare/v0.17.0...v0.18.0) \ No newline at end of file diff --git a/docs/oss/release-notes/releases/0.18.1.mdx b/docs/oss/release-notes/releases/0.18.1.mdx new file mode 100644 index 000000000..1b726bf7e --- /dev/null +++ b/docs/oss/release-notes/releases/0.18.1.mdx @@ -0,0 +1,13 @@ +--- +title: "Elementary 0.18.1" +sidebarTitle: "0.18.1" +--- + +This update includes: + +- 🗄️ Athena now works in the CLI +- 🔍 Added NOT_CONTAINS filter type +- 🔧 Development workflow improvements + +Check out the full details here: https://github.com/elementary-data/elementary/releases/tag/v0.18.1 + diff --git a/docs/oss/release-notes/releases/0.18.2.mdx b/docs/oss/release-notes/releases/0.18.2.mdx new file mode 100644 index 000000000..9ebefcb81 --- /dev/null +++ b/docs/oss/release-notes/releases/0.18.2.mdx @@ -0,0 +1,17 @@ +--- +title: "Elementary 0.18.2" +sidebarTitle: "0.18.2" +--- + +This update includes: + +- 🔔 Subscribers in grouped alerts +- 🗄️ ClickHouse CLI integration +- 📁 FileSystemMessagingIntegration support +- ☁️ S3 report permissions control with --s3-acl option +- 🐛 Fixed setup of internal dbt project +- 🔧 Added function for `disable_elementary_logo_print` +- 📊 Updated report to 1.0.26 + +Check out the full details here: https://github.com/elementary-data/elementary/releases/tag/v0.18.2 + diff --git a/docs/oss/release-notes/releases/0.18.3.mdx b/docs/oss/release-notes/releases/0.18.3.mdx new file mode 100644 index 000000000..ecf8bbc45 --- /dev/null +++ b/docs/oss/release-notes/releases/0.18.3.mdx @@ -0,0 +1,11 @@ +--- +title: "Elementary 0.18.3" +sidebarTitle: "0.18.3" +--- + +This update includes: + +- 🐛 Fixed missing metrics in CLI + +Check out the full details here: https://github.com/elementary-data/elementary/releases/tag/v0.18.3 + diff --git a/docs/oss/release-notes/releases/0.19.0.mdx b/docs/oss/release-notes/releases/0.19.0.mdx new file mode 100644 index 000000000..6d410ff61 --- /dev/null +++ b/docs/oss/release-notes/releases/0.19.0.mdx @@ -0,0 +1,16 @@ +--- +title: "Elementary 0.19.0" +sidebarTitle: "0.19.0" +--- + +This update includes: + +- 🔔 Enhanced alert messages with support for multiple links and icons +- 🔄 Version alignment updates + +**Known Issues:** + +- dbt-databricks must be below 1.10.2 (See [issue 1931](https://github.com/elementary-data/elementary/issues/1931) for more details) + +Check out the full details here: https://github.com/elementary-data/elementary/releases/tag/v0.19.0 + diff --git a/docs/oss/release-notes/releases/0.19.1.mdx b/docs/oss/release-notes/releases/0.19.1.mdx new file mode 100644 index 000000000..fe53b83ad --- /dev/null +++ b/docs/oss/release-notes/releases/0.19.1.mdx @@ -0,0 +1,12 @@ +--- +title: "Elementary 0.19.1" +sidebarTitle: "0.19.1" +--- + +This update includes: + +- 🔧 Added excludes option to edr monitor +- 📝 Text and markdown format support + +Check out the full details here: https://github.com/elementary-data/elementary/releases/tag/v0.19.1 + diff --git a/docs/oss/release-notes/releases/0.19.2.mdx b/docs/oss/release-notes/releases/0.19.2.mdx new file mode 100644 index 000000000..e4b22961c --- /dev/null +++ b/docs/oss/release-notes/releases/0.19.2.mdx @@ -0,0 +1,11 @@ +--- +title: "Elementary 0.19.2" +sidebarTitle: "0.19.2" +--- + +This update includes: + +- 🐛 Fixed backwards compatibility issue with pydantic + +Check out the full details here: https://github.com/elementary-data/elementary/releases/tag/v0.19.2 + diff --git a/docs/oss/release-notes/releases/0.19.3.mdx b/docs/oss/release-notes/releases/0.19.3.mdx new file mode 100644 index 000000000..2b579fbb1 --- /dev/null +++ b/docs/oss/release-notes/releases/0.19.3.mdx @@ -0,0 +1,11 @@ +--- +title: "Elementary 0.19.3" +sidebarTitle: "0.19.3" +--- + +This update includes: + +- 🔧 Fixed package version + +Check out the full details here: https://github.com/elementary-data/elementary/releases/tag/v0.19.3 + diff --git a/docs/oss/release-notes/releases/0.19.4.mdx b/docs/oss/release-notes/releases/0.19.4.mdx new file mode 100644 index 000000000..0bee7152f --- /dev/null +++ b/docs/oss/release-notes/releases/0.19.4.mdx @@ -0,0 +1,12 @@ +--- +title: "Elementary 0.19.4" +sidebarTitle: "0.19.4" +--- + +This update includes: + +- 🔔 Improved freshness alerts with full source names +- 🔄 Updated elementary dbt package to v0.19.2 + +Check out the full details here: https://github.com/elementary-data/elementary/releases/tag/v0.19.4 + diff --git a/docs/oss/release-notes/releases/0.19.5.mdx b/docs/oss/release-notes/releases/0.19.5.mdx new file mode 100644 index 000000000..0d82d0ae4 --- /dev/null +++ b/docs/oss/release-notes/releases/0.19.5.mdx @@ -0,0 +1,15 @@ +--- +title: "Elementary 0.19.5" +sidebarTitle: "0.19.5" +--- + +This update includes: + +- 🔔 Slack integration improvements with timeout feature +- 🐛 Fixed Slack join channel recursion issue +- 📊 Dimension anomalies visualization +- ⚡ Performance improvements with alert filters using sets +- 🔄 Dremio types mapping updates + +Check out the full details here: https://github.com/elementary-data/elementary/releases/tag/v0.19.5 + diff --git a/docs/oss/release-notes/releases/0.20.0.mdx b/docs/oss/release-notes/releases/0.20.0.mdx new file mode 100644 index 000000000..03447ada2 --- /dev/null +++ b/docs/oss/release-notes/releases/0.20.0.mdx @@ -0,0 +1,15 @@ +--- +title: "Elementary 0.20.0" +sidebarTitle: "0.20.0" +--- + +This update includes: + +- 🔧 dbt Fusion support +- 🔔 Improvements to alert messages with attribution blocks +- 🐛 Bug fixes for ClickHouse, CI artifacts, and dbt profiles +- 📊 Teams alerts formatting improvements +- 🔄 Updated dbt package and removed deprecated flags + +Check out the full details here: https://github.com/elementary-data/elementary/releases/tag/v0.20.0 + diff --git a/docs/oss/release-notes/upgrading-elementary.mdx b/docs/oss/release-notes/upgrading-elementary.mdx index ca924cc2e..563081902 100644 --- a/docs/oss/release-notes/upgrading-elementary.mdx +++ b/docs/oss/release-notes/upgrading-elementary.mdx @@ -21,7 +21,7 @@ packages: dbt deps ``` -3. If this is a change of minor version (like `0.4.X` -> `0.5.X`), run also: +3. When there's a change in the structure of the Elementary tables, the minor version is raised. If you're updating a minor version (for example 0.17.X -> 0.18.X), run this command to rebuild the Elementary tables: ```shell dbt run --select elementary diff --git a/docs/overview/elementary-cloud.mdx b/docs/overview/elementary-cloud.mdx index 17675f7d5..e9c4ffd32 100644 --- a/docs/overview/elementary-cloud.mdx +++ b/docs/overview/elementary-cloud.mdx @@ -4,7 +4,12 @@ sidebarTitle: "Elementary Cloud" icon: "cloud" --- - +import IntroductionOpening from '/snippets/cloud/introduction-opening.mdx'; +import Introduction from '/snippets/cloud/introduction.mdx'; + + + + - + diff --git a/docs/pics/cloud/code_review_comment.png b/docs/pics/cloud/code_review_comment.png new file mode 100644 index 000000000..7c3d8412e Binary files /dev/null and b/docs/pics/cloud/code_review_comment.png differ diff --git a/docs/pics/cloud/code_review_comment_gitlab.png b/docs/pics/cloud/code_review_comment_gitlab.png new file mode 100644 index 000000000..761c31e03 Binary files /dev/null and b/docs/pics/cloud/code_review_comment_gitlab.png differ diff --git a/docs/pics/cloud/dremio/account-settings.png b/docs/pics/cloud/dremio/account-settings.png new file mode 100644 index 000000000..bb4535df7 Binary files /dev/null and b/docs/pics/cloud/dremio/account-settings.png differ diff --git a/docs/pics/cloud/dremio/dremio-form.png b/docs/pics/cloud/dremio/dremio-form.png new file mode 100644 index 000000000..5d5ea851d Binary files /dev/null and b/docs/pics/cloud/dremio/dremio-form.png differ diff --git a/docs/pics/cloud/dremio/generate-token.png b/docs/pics/cloud/dremio/generate-token.png new file mode 100644 index 000000000..1f7325da5 Binary files /dev/null and b/docs/pics/cloud/dremio/generate-token.png differ diff --git a/docs/pics/cloud/incident_digest_cadence.png b/docs/pics/cloud/incident_digest_cadence.png new file mode 100644 index 000000000..17a0826c7 Binary files /dev/null and b/docs/pics/cloud/incident_digest_cadence.png differ diff --git a/docs/pics/cloud/incident_digest_email.png b/docs/pics/cloud/incident_digest_email.png new file mode 100644 index 000000000..aa4d44ac5 Binary files /dev/null and b/docs/pics/cloud/incident_digest_email.png differ diff --git a/docs/pics/cloud/incident_digest_list.png b/docs/pics/cloud/incident_digest_list.png new file mode 100644 index 000000000..ec27f4b20 Binary files /dev/null and b/docs/pics/cloud/incident_digest_list.png differ diff --git a/docs/pics/cloud/incident_digest_rule_menu.png b/docs/pics/cloud/incident_digest_rule_menu.png new file mode 100644 index 000000000..7739acd31 Binary files /dev/null and b/docs/pics/cloud/incident_digest_rule_menu.png differ diff --git a/docs/pics/cloud/integrations/databricks/storage-direct-access-keys.png b/docs/pics/cloud/integrations/databricks/storage-direct-access-keys.png new file mode 100644 index 000000000..57da17a99 Binary files /dev/null and b/docs/pics/cloud/integrations/databricks/storage-direct-access-keys.png differ diff --git a/docs/pics/cloud/integrations/databricks/storage-direct-access-role.png b/docs/pics/cloud/integrations/databricks/storage-direct-access-role.png new file mode 100644 index 000000000..573adfa3a Binary files /dev/null and b/docs/pics/cloud/integrations/databricks/storage-direct-access-role.png differ diff --git a/docs/python-sdk/api-reference/overview.mdx b/docs/python-sdk/api-reference/overview.mdx new file mode 100644 index 000000000..80fd07f23 --- /dev/null +++ b/docs/python-sdk/api-reference/overview.mdx @@ -0,0 +1,166 @@ +--- +title: "API Reference Overview" +--- + +The Elementary Python SDK provides a simple API for sending data quality information to Elementary Cloud. This page provides an overview of the API structure and endpoints. + +## Getting Your API Credentials + +Before initializing the client, you need to obtain your API credentials from Elementary Cloud. + +### Generate an Access Token + +You can generate tokens directly from the Elementary UI: + +1. Go to [User → Personal Tokens](https://app.elementary-data.com/settings/user-tokens) or [Account → Account Tokens](https://app.elementary-data.com/settings/account-tokens) +2. Click **Generate token** +3. (Optional) Add a name/description for the token +4. Copy the token and store it securely — **it is shown only once** + +### Security + +- **User tokens** are user-scoped bearer tokens and inherit your workspace permissions +- **Account tokens** are account-scoped bearer tokens with "Can View" permissions +- Treat tokens like passwords — do not share or commit them to version control +- Keep them secret, rotate regularly, and revoke immediately if compromised + +For more details, see the [MCP Setup Guide](/cloud/mcp/setup-guide#1--generate-an-access-token) which uses the same token generation process. + +## Client Initialization + +The SDK uses `ElementaryCloudClient` to send data to Elementary Cloud: + +```python +from elementary_python_sdk.core.cloud.cloud_client import ElementaryCloudClient + +client = ElementaryCloudClient(project_id, api_key, url) +``` + +Where: +- `project_id` is your Python project identifier (chosen by you, used to deduplicate and identify reported assets) +- `api_key` is your API token (generated from the steps above) +- `url` is the full SDK ingest endpoint URL (the Elementary team will provide you with this URL): `{base_url}/sdk-ingest/{env_id}/batch` + - Example: `https://app.elementary-data.com/sdk-ingest/a6b2425d-36e2-4e13-8458-9825688ca1f2/batch` + +## Test Context + +Tests are run within an `elementary_test_context` which automatically captures test results: + +```python +from elementary_python_sdk.core.tests import elementary_test_context +from elementary_python_sdk.core.types.asset import TableAsset + +asset = TableAsset(name="users", database_name="prod", schema_name="public", table_name="users") + +with elementary_test_context(asset=asset) as ctx: + # Run your tests here + # Results are automatically captured in ctx + client.send_to_cloud(ctx) +``` + +### `raise_on_error` + +By default, `elementary_test_context` uses `raise_on_error=False`. This means that if a decorated test (or something inside the context) raises an exception, the SDK **captures it and records an `ERROR` execution** so you can still send results to Elementary Cloud without crashing your pipeline. + +If you prefer **fail-fast** behavior (for example in CI), pass `raise_on_error=True` to re-raise exceptions after they are recorded: + +```python +with elementary_test_context(asset=asset, raise_on_error=True) as ctx: + run_my_tests(df) +``` + +## Test Decorators + +The SDK provides decorators to define tests: + +- `@boolean_test` - For tests that return True/False (pass/fail) +- `@expected_range` - For tests that return numeric values within a range +- `@expected_values` - For tests that return values matching a list of expected values +- `@row_count` - For tests that return a Sized object (DataFrame, list, etc.) to check row count + +## Context Manager Approach + +You can also use context managers for inline tests: + +```python +with elementary_test_context(asset=asset) as ctx: + # Using context managers + with ctx.boolean_test(name="my_test", description="Inline test") as my_bool_test: + my_bool_test.assert_value(my_test_function()) + + with ctx.expected_values_test( + name="country_count", + expected=[2, 3], + allow_none=True, + metadata={"my_metadata_field": "my_metadata_value"}, + ) as my_expected_values_test: + # This will fail + my_expected_values_test.assert_value(5) + # This will pass + my_expected_values_test.assert_value(3) + + with ctx.expected_range_test( + name="age_range", + min=18, + max=50, + ) as my_range_test: + my_range_test.assert_value(25.5) + + with ctx.row_count_test( + name="row_count", + min=1, + max=1000, + ) as my_row_count_test: + my_row_count_test.assert_value(users_df) # Passes DataFrame, list, etc. +``` + +## Supported Objects + +The SDK supports reporting table assets and test results. + + + + Register tables and views in your data warehouse + + + Define data quality tests using decorators + + + +## Sending Results + +After running tests in a context, send results to Elementary Cloud: + +```python +client.send_to_cloud(ctx) +``` + +This automatically batches all test results from the context and sends them in a single request. + +## Error Handling + +The SDK handles errors automatically, but you can wrap calls in try-except blocks: + +```python +try: + client.send_to_cloud(ctx) +except Exception as e: + print(f"Error sending results: {e}") +``` + +## Best Practices + +- **Run multiple tests in one context** - All tests in a single `elementary_test_context` are automatically batched +- **Use descriptive test names** - Clear names help identify tests in the Elementary UI +- **Include asset metadata** - Add descriptions, owners, tags, and dependencies to assets + + +All tests run within a single `elementary_test_context` are automatically batched and sent together. + + +## Next Steps + +- [Test Decorators](/python-sdk/api-reference/test-decorators) - Complete reference for all test decorators +- [Table Assets](/python-sdk/api-reference/table-assets) - Learn about table asset structure +- [Quickstart](/python-sdk/quickstart) - Send your first test results to Elementary Cloud + diff --git a/docs/python-sdk/api-reference/table-assets.mdx b/docs/python-sdk/api-reference/table-assets.mdx new file mode 100644 index 000000000..e96af1114 --- /dev/null +++ b/docs/python-sdk/api-reference/table-assets.mdx @@ -0,0 +1,80 @@ +--- +title: "Table Assets" +--- + +Table assets represent tables or views in your data warehouse. Register table assets to track metadata, ownership, and descriptions in Elementary Cloud. + +## TableAsset Object + +```python +from elementary_python_sdk import TableAsset + +asset = TableAsset( + name="string", # Required: Table name + database_name="string", # Required: Database name + schema_name="string", # Required: Schema name + table_name="string", # Required: Table name + description="string", # Optional: Table description + owners=["string"], # Optional: List of owners (emails or usernames) + tags=["string"], # Optional: List of tags + depends_on=["string"] # Optional: List of upstream fully qualified table names +) +``` + +## Required Fields + +| Field | Type | Description | +|-------|------|-------------| +| `name` | string | Display name for the table | +| `database_name` | string | Name of the database containing the table | +| `schema_name` | string | Name of the schema containing the table | +| `table_name` | string | Name of the table | + +## Optional Fields + +| Field | Type | Description | +|-------|------|-------------| +| `description` | string | Human-readable description of the table | +| `owners` | list[string] | List of owners (email addresses or usernames) | +| `tags` | list[string] | List of tags for categorization | +| `depends_on` | list[string] | List of upstream fully qualified table names (e.g., `["prod.public.customers", "prod.public.orders"]`) for lineage tracking | + +## Example + +```python +from elementary_python_sdk import TableAsset + +# Create a table asset +asset = TableAsset( + name="users", + database_name="prod", + schema_name="public", + table_name="users", + description="Users table", + owners=["data-team"], + tags=["pii", "production"], + depends_on=["prod.public.customers", "prod.public.orders"] +) +``` + +## Best Practices + +1. **Include descriptions** - Add meaningful descriptions to help your team understand what each table contains + +2. **Set owners** - Assign owners to tables so they receive alerts and notifications + +3. **Use tags** - Tag tables to enable filtering and grouping in the Elementary UI + +4. **Define dependencies** - Use `depends_on` to establish lineage connections to upstream assets + +5. **Update regularly** - Send updated table assets when metadata changes (descriptions, owners, tags, dependencies) + + +Table assets are updated on each ingest, so include all current metadata in every request. + + +## Related Documentation + +- [Test Decorators](/python-sdk/api-reference/test-decorators) - Define tests for your table assets +- [API Reference](/python-sdk/api-reference/overview) - Overview of the SDK API + diff --git a/docs/python-sdk/api-reference/test-decorators.mdx b/docs/python-sdk/api-reference/test-decorators.mdx new file mode 100644 index 000000000..22c4606d1 --- /dev/null +++ b/docs/python-sdk/api-reference/test-decorators.mdx @@ -0,0 +1,239 @@ +--- +title: "Test Decorators Reference" +--- + +Complete reference for all test decorators available in the Elementary Python SDK. + +## Import Statement + +```python +from elementary_python_sdk.core.tests import ( + boolean_test, + expected_range, + expected_values, + row_count, +) +``` + +## @boolean_test + +Tests that return a boolean (True/False) result. + +### Signature + +```python +@boolean_test( + name: str, + severity: str | TestSeverity = "ERROR", + description: str | None = None, + tags: list[str] | None = None, + owners: list[str] | None = None, + metadata: dict | None = None, + column_name: str | None = None, + quality_dimension: QualityDimension | None = None, + skip: bool = False, +) +def test_function(df: pd.DataFrame) -> bool: + # Your test logic + return True # or False +``` + +### Parameters + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| `name` | str | Yes | - | Test name | +| `severity` | str | No | `"ERROR"` | Test severity: `"ERROR"` or `"WARNING"` | +| `description` | str | No | `None` | Test description | +| `column_name` | str | No | `None` | Column being tested (for column-level tests) | +| `tags` | list[str] | No | `None` | List of tags | +| `owners` | list[str] | No | `None` | List of owners | +| `metadata` | dict | No | `None` | Additional metadata | +| `quality_dimension` | QualityDimension | No | `None` | Quality dimension (defaults to VALIDITY) | +| `skip` | bool | No | `False` | Whether to skip this test. Useful if you want the test to appear in Elementary Cloud, but you don't want to execute it in this run. | + +### Example + +```python +@boolean_test( + name="unique_ids", + description="All user IDs must be unique", + column_name="id", + severity="ERROR", +) +def test_unique_ids(df: pd.DataFrame) -> bool: + ids = df["id"].dropna().tolist() + return len(ids) == len(set(ids)) +``` + +## @expected_range + +Tests that return a numeric value that should fall within a range. They can also return a list of numeric values or a pandas Series. + +### Signature + +```python +@expected_range( + name: str, + min: float | None = None, + max: float | None = None, + severity: str | TestSeverity = "ERROR", + description: str | None = None, + tags: list[str] | None = None, + owners: list[str] | None = None, + metadata: dict | None = None, + column_name: str | None = None, + quality_dimension: QualityDimension | None = None, + skip: bool = False, +) +def test_function(df: pd.DataFrame) -> float | list[float] | pd.Series: + # Your test logic + return df["age"].mean() # Numeric value + # return [1, 2, 3] # Numeric values + # return df["age"] # pandas Series +``` + +### Parameters + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| `name` | str | Yes | - | Test name | +| `min` | float | No | `None` | Minimum expected value (inclusive) | +| `max` | float | No | `None` | Maximum expected value (inclusive) | +| `severity`, `description`, `column_name`, `tags`, `owners`, `metadata`, `quality_dimension`, `skip` | - | No | - | Same as `@boolean_test` | + +### Example + +```python +@expected_range( + name="average_age", + min=18, + max=50, + description="Average age should be between 18 and 50", + column_name="age", + severity="ERROR", +) +def test_average_age(df: pd.DataFrame) -> float: + return df["age"].mean() +``` + +## @expected_values + +Tests that return a value (or values) that should match one of a list of expected values. + +### Signature + +```python +@expected_values( + name: str, + expected: Any | list[Any], + allow_none: bool = False, + severity: str | TestSeverity = "ERROR", + description: str | None = None, + tags: list[str] | None = None, + owners: list[str] | None = None, + metadata: dict | None = None, + column_name: str | None = None, + quality_dimension: QualityDimension | None = None, + skip: bool = False, +) +def test_function(df: pd.DataFrame) -> Any: + # Your test logic + return value # Should match one of expected values +``` + +### Parameters + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| `name` | str | Yes | - | Test name | +| `expected` | Any \| list[Any] | Yes | - | Expected value(s) - can be single value or list | +| `allow_none` | bool | No | `False` | Whether to allow None values | +| `severity`, `description`, `column_name`, `tags`, `owners`, `metadata`, `quality_dimension`, `skip` | - | No | - | Same as `@boolean_test` | + +### Example + +```python +@expected_values( + name="country_count", + expected=2, + severity="ERROR", + description="Should have exactly 2 countries", + column_name="country", +) +def count_unique_countries(df: pd.DataFrame) -> int: + return df["country"].nunique() +``` + +## @row_count + +Tests that return a Sized object (DataFrame, list, etc.) to check row count. + +### Signature + +```python +@row_count( + name: str, + min: int | None = None, + max: int | None = None, + severity: str | TestSeverity = "ERROR", + description: str | None = None, + tags: list[str] | None = None, + owners: list[str] | None = None, + metadata: dict | None = None, + skip: bool = False, +) +def test_function(df: pd.DataFrame) -> Sized: + # Your test logic - return DataFrame, list, etc. + return df # or any object with __len__ +``` + +### Parameters + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| `name` | str | Yes | - | Test name | +| `min` | int | No | `None` | Minimum expected row count (inclusive) | +| `max` | int | No | `None` | Maximum expected row count (inclusive) | +| `severity`, `description`, `tags`, `owners`, `metadata`, `skip` | - | No | - | Same as `@boolean_test` | + +### Example + +```python +@row_count( + name="user_count_range", + min=1, + max=1000000, + severity="WARNING", + description="Validate user count is within expected range", +) +def get_users_df(df: pd.DataFrame) -> pd.DataFrame: + """Return the DataFrame; the decorator calls len() on it.""" + return df +``` + +## Common Parameters + +All decorators support these common parameters: + +- **`name`** (required): Unique test name +- **`severity`**: `"ERROR"` or `"WARNING"` (default: `"ERROR"`) +- **`description`**: Human-readable test description +- **`tags`**: List of tags for categorization +- **`owners`**: List of owner emails/usernames +- **`metadata`**: Dictionary of additional metadata +- **`skip`**: Boolean to skip the test + +## Return Types + +- **`@boolean_test`**: Must return `bool` +- **`@expected_range`**: Must return numeric value (int or float) +- **`@expected_values`**: Can return any type that can be compared +- **`@row_count`**: Must return a Sized object (has `__len__` method) + +## Related Documentation + +- [Quickstart](/python-sdk/quickstart) - Get started with test decorators +- [API Reference](/python-sdk/api-reference/overview) - Overview of the SDK API +- [Table Assets](/python-sdk/api-reference/table-assets) - Register tables and views in your data warehouse + diff --git a/docs/python-sdk/installation.mdx b/docs/python-sdk/installation.mdx new file mode 100644 index 000000000..697d55ddd --- /dev/null +++ b/docs/python-sdk/installation.mdx @@ -0,0 +1,62 @@ +--- +title: "Installation" +--- + +Install the Elementary Python SDK to start sending data quality information to Elementary Cloud. + +## Prerequisites + +- Python 3.8 or higher +- An Elementary Cloud account +- API key and environment ID from your Elementary Cloud account + +## Install via pip + +```bash +pip install elementary-python-sdk +``` + +## Install via poetry + +```bash +poetry add elementary-python-sdk +``` + +## Verify Installation + +After installation, verify that the SDK is correctly installed: + +```python +import elementary_python_sdk +print(elementary_python_sdk.__version__) +``` + +## Get Your Credentials + +To use the Python SDK, you'll need: + +1. **Project ID** - Your Python project identifier (choose any string to identify your code project; used to deduplicate and identify reported assets) +2. **API Key** - Your Elementary Cloud API token +3. **URL** - The full endpoint URL: `{base_url}/sdk-ingest/{env_id}/batch` + - Example: `https://app.elementary-data.com/sdk-ingest/a6b2425d-36e2-4e13-8458-9825688ca1f2/batch` + +### Generate an Access Token + +You can generate tokens directly from the Elementary UI: + +1. Go to [User → Personal Tokens](https://app.elementary-data.com/settings/user-tokens) or [Account → Account Tokens](https://app.elementary-data.com/settings/account-tokens) +2. Click **Generate token** +3. (Optional) Add a name/description for the token +4. Copy the token and store it securely — **it is shown only once** + +For detailed instructions, see the [API Reference](/python-sdk/api-reference/overview#getting-your-api-credentials) or the [MCP Setup Guide](/cloud/mcp/setup-guide#1--generate-an-access-token). + + +Keep your API key secure. Never commit it to version control. Use environment variables or a secrets management system. + + +## Next Steps + +- [Quickstart Guide](/python-sdk/quickstart) - Send your first data to Elementary Cloud +- [API Reference](/python-sdk/api-reference/overview) - Explore the full API documentation + diff --git a/docs/python-sdk/introduction.mdx b/docs/python-sdk/introduction.mdx new file mode 100644 index 000000000..a76d5476b --- /dev/null +++ b/docs/python-sdk/introduction.mdx @@ -0,0 +1,74 @@ +--- +title: "Python SDK" +sidebarTitle: "Introduction" +icon: "code" +--- + +The Elementary Python SDK enables you to programmatically send data quality information to Elementary Cloud. Use the SDK to integrate Elementary's data observability capabilities into your custom data pipelines, Python applications, or any system that needs to report data quality metrics. + + + + + + + + + + +## What is the Python SDK? + +The Elementary Python SDK brings observability and testing directly into your Python pipelines. It captures any Python test result, from any framework, and reports it to Elementary Cloud. + +The SDK allows you to: + +- **Define data quality tests** - Use decorators to create tests that validate your data +- **Report test results** - Automatically send test execution results to Elementary Cloud +- **Register assets** - Define tables, views, and other data assets with metadata +- **Track execution context** - Capture run metadata, lineage, and dependencies +- **Integrate with any framework** - Works with Great Expectations, DQX, or your own test code + +## Key Features + +- **Framework-agnostic** - Works with any Python testing framework (Great Expectations, DQX, custom code) +- **Decorator-based API** - Simple decorators to define tests (`@boolean_test`, `@expected_range`, etc.) +- **Context management** - Use `elementary_test_context` to automatically capture test results +- **Unified observability** - Python tests appear alongside dbt tests and cloud tests in Elementary Cloud +- **Full lineage** - Connect Python assets to dbt models, warehouse tables, and ML outputs + +## Use Cases + +The Python SDK is ideal for: + +- **Ingestion pipelines** - Catch issues before data hits the warehouse +- **Python-based transformations** - Monitor PySpark, SQL generation, and data processing pipelines +- **AI/ML pipelines** - Track vectorization, embeddings, model training, and feature generation +- **Hybrid pipelines** - Monitor structured, semi-structured, and unstructured data flows +- **Post-warehouse pipelines** - Validate data streaming to downstream destinations, APIs, and operational systems + + +The Python SDK works alongside the Elementary dbt package. You can use both to monitor your entire data stack. + + +## How It Works + +1. **Install the SDK** - Add the Elementary Python SDK to your Python environment +2. **Define tests** - Use decorators (`@boolean_test`, `@expected_range`, etc.) to create data quality tests +3. **Create assets** - Define table assets with metadata, owners, tags, and dependencies +4. **Run tests in context** - Use `elementary_test_context` to automatically capture test results +5. **Send results** - Use `ElementaryCloudClient` to send test results to Elementary Cloud +6. **Monitor** - View your data quality metrics unified with dbt tests and cloud tests + + + Follow our quickstart guide to send your first data to Elementary Cloud in minutes. + + + + Learn more about the Python SDK and see real-world examples in our blog post. + + diff --git a/docs/python-sdk/quickstart.mdx b/docs/python-sdk/quickstart.mdx new file mode 100644 index 000000000..ed6def244 --- /dev/null +++ b/docs/python-sdk/quickstart.mdx @@ -0,0 +1,223 @@ +--- +title: "Quickstart" +--- + +Get started with the Elementary Python SDK in minutes. This guide shows you how to define data quality tests using decorators and automatically report them to Elementary Cloud. + +## Step 1: Install the SDK + +```bash +pip install elementary-python-sdk +``` + +## Step 2: Import Required Modules + +```python +import pandas as pd +from elementary_python_sdk.core.cloud.cloud_client import ElementaryCloudClient +from elementary_python_sdk.core.tests import ( + boolean_test, + elementary_test_context, + expected_range, + expected_values, + row_count, +) +from elementary_python_sdk.core.types.asset import TableAsset +``` + +## Step 3: Define Your Tests Using Decorators + +The SDK provides decorators to define tests. Here are examples: + +```python +# Define a boolean test (pass/fail) +@boolean_test( + name="unique_ids", + description="All user IDs must be unique", + column_name="id", + severity="ERROR", +) +def test_unique_ids(df: pd.DataFrame) -> bool: + ids = df["id"].dropna().tolist() + return len(ids) == len(set(ids)) + +# Define a range test +@expected_range( + name="average_age", + min=18, + max=50, + description="Average age should be between 18 and 50", + column_name="age", + severity="ERROR", +) +def test_average_age(df: pd.DataFrame) -> float: + return df["age"].mean() + +# Define a row count test +@row_count( + name="user_count_range", + min=1, + max=1000000, + severity="WARNING", + description="Validate user count is within expected range", +) +def test_users_row_count(df: pd.DataFrame) -> pd.DataFrame: + """Return the DataFrame; the decorator calls len() on it.""" + return df + +# Define an expected values test +@expected_values( + name="only_valid_countries", + expected=["Germany", "France", "Italy"], + severity="ERROR", + description="Should contain only valid countries", + column_name="country", +) +def test_only_valid_countries(df: pd.DataFrame) -> pd.Series: + return df["country"] +``` + +## Step 4: Create Your Data Asset + +```python +# Define the tested asset +asset = TableAsset( + name="users", + database_name="prod", + schema_name="public", + table_name="users", + description="Users table", + owners=["data-team"], + tags=["pii", "production"], + depends_on=["prod.public.customers", "prod.public.orders"] +) +``` + +## Step 5: Run Tests and Send Results + +```python +def main(): + # Create sample data + users_df = pd.DataFrame( + { + "id": [1, 2, 3, 4, 5, 6, 7, 8], + "age": [23, 30, 46, 76, 76, 123, 45, 32], + "country": ["Germany", "France", "Germany", "France", "", "Italy", "France", "Germany"], + } + ) + + # Use the test context to automatically capture test results + with elementary_test_context(asset=asset) as ctx: + # Run tests - results are automatically captured + test_average_age(users_df) + test_unique_ids(users_df) + test_users_row_count(users_df) + test_only_valid_countries(users_df) + + # Send results to Elementary Cloud + PROJECT_ID = "my-python-project" # Your Python project identifier (used to deduplicate and identify assets) + API_KEY = "your-api-key" + URL = "https://app.elementary-data.com/sdk-ingest/{env_id}/batch" + + client = ElementaryCloudClient(PROJECT_ID, API_KEY, URL) + client.send_to_cloud(ctx) + +if __name__ == "__main__": + main() +``` + +## Complete Example + +Here's the complete example from the [Elementary blog post](https://www.elementary-data.com/post/data-quality-in-python-pipelines): + +```python +import pandas as pd +from elementary_python_sdk.core.cloud.cloud_client import ElementaryCloudClient +from elementary_python_sdk.core.tests import ( + boolean_test, + elementary_test_context, + expected_range, +) +from elementary_python_sdk.core.types.asset import TableAsset + +# Define "unique ids" test +@boolean_test( + name="unique_ids", + description="All user IDs must be unique", + column_name="id", +) +def test_unique_ids(df: pd.DataFrame) -> bool: + ids = df["id"].dropna().tolist() + return len(ids) == len(set(ids)) + +# Define "average age" test +@expected_range( + name="average_age", + min=18, + max=50, + description="Average age should be between 18 and 50", + column_name="age", +) +def test_average_age(df: pd.DataFrame) -> float: + return df["age"].mean() + +def main(): + # Create sample data + users_df = pd.DataFrame( + { + "id": [1, 2, 3, 4, 5, 6, 7, 8], + "age": [23, 30, 46, 76, 76, 123, 45, 32], + "country": ["Germany", "France", "Germany", "France", "", "Italy", "France", "Germany"], + } + ) + + # Define the tested asset + asset = TableAsset( + name="users", + database_name="prod", + schema_name="public", + table_name="users", + description="Users table", + owners=["data-team"], + tags=["pii", "production"], + depends_on=["prod.public.customers", "prod.public.orders"] + ) + + # Run tests and report the results + with elementary_test_context(asset=asset) as ctx: + test_average_age(users_df) + test_unique_ids(users_df) + + # Initialize client and send results + PROJECT_ID = "my-python-project" # Your Python project identifier (used to deduplicate and identify assets) + API_KEY = "your-api-key" + URL = "https://app.elementary-data.com/sdk-ingest/{env_id}/batch" + + client = ElementaryCloudClient(PROJECT_ID, API_KEY, URL) + client.send_to_cloud(ctx) + +if __name__ == "__main__": + main() +``` + +**Note:** +- Replace `API_KEY` and `URL` with your actual credentials. The `URL` should be the full SDK ingest endpoint including your environment ID. +- `PROJECT_ID` is your Python project identifier - choose any string to identify your code project. This will appear in the metadata of assets you report and is used for deduplication. + + +## What Happens Next? + +Once you send test results to Elementary Cloud: + +- **Tests appear in the test overview** - View execution history, test queries, and configuration +- **Alerts fire automatically** - Get notified via Slack, PagerDuty, or email when tests fail +- **Incidents are created** - Automatic incident creation with Jira ticket integration +- **Lineage is connected** - Python assets link to dbt models, warehouse tables, and ML outputs +- **Assets are discoverable** - All tables, views, and data entities appear in the Elementary catalog + +## What's Next? + +- [API Reference](/python-sdk/api-reference/overview) - Learn about all available objects and methods +- [Test Decorators](/python-sdk/api-reference/test-decorators) - Complete reference for all test decorators +- [Table Assets](/python-sdk/api-reference/table-assets) - Learn about table asset structure + diff --git a/docs/_snippets/add-connection-profile.mdx b/docs/snippets/add-connection-profile.mdx similarity index 94% rename from docs/_snippets/add-connection-profile.mdx rename to docs/snippets/add-connection-profile.mdx index 9f7a24289..964050ace 100644 --- a/docs/_snippets/add-connection-profile.mdx +++ b/docs/snippets/add-connection-profile.mdx @@ -1,3 +1,5 @@ +import AllProfiles from '/snippets/profiles/all-profiles.mdx'; + ## Configuring the Elementary Profile In order to connect, Elementary needs a [connection profile](https://docs.getdbt.com/dbt-cli/configure-your-profile) in a file named `profiles.yml`. @@ -40,4 +42,4 @@ Here is a demonstration: - Profile name: `elementary` - Schema name: The schema of elementary models, default is `_elementary` - + diff --git a/docs/snippets/ai-generate-test.mdx b/docs/snippets/ai-generate-test.mdx new file mode 100644 index 000000000..d6e029d00 --- /dev/null +++ b/docs/snippets/ai-generate-test.mdx @@ -0,0 +1,3 @@ + + Let our Slack chatbot create the anomaly test you need. + \ No newline at end of file diff --git a/docs/_snippets/alerts/alerts-configuration.mdx b/docs/snippets/alerts/alerts-configuration.mdx similarity index 88% rename from docs/_snippets/alerts/alerts-configuration.mdx rename to docs/snippets/alerts/alerts-configuration.mdx index fb21b075d..cbc4fd7ac 100644 --- a/docs/_snippets/alerts/alerts-configuration.mdx +++ b/docs/snippets/alerts/alerts-configuration.mdx @@ -1,5 +1,5 @@ - + Use Alert Rules to distribute your alerts to the right channels. diff --git a/docs/_snippets/alerts/alerts-introduction.mdx b/docs/snippets/alerts/alerts-introduction.mdx similarity index 94% rename from docs/_snippets/alerts/alerts-introduction.mdx rename to docs/snippets/alerts/alerts-introduction.mdx index 5f1176c52..b7176b3d8 100644 --- a/docs/_snippets/alerts/alerts-introduction.mdx +++ b/docs/snippets/alerts/alerts-introduction.mdx @@ -14,6 +14,8 @@ Custom channel, suppression interval, alert filters, etc.
Slack alert format diff --git a/docs/snippets/alerts/description.mdx b/docs/snippets/alerts/description.mdx new file mode 100644 index 000000000..76e249ffe --- /dev/null +++ b/docs/snippets/alerts/description.mdx @@ -0,0 +1,31 @@ +Elementary supports configuring description for tests that are included in alerts. +It's recommended to add an explanation of what does it mean if this test fails, so alert will include this context. + + + +```yml test +data_tests: + - not_null: + config: + meta: + description: "This is the test description" +``` + +```yml test config block +{{ config( + tags=["Tag1","Tag2"] + meta={ + description: "This is the test description" + } +) }} +``` + +```yml dbt_project.yml +data_tests: + path: + subfolder: + +meta: + description: "This is the test description" +``` + + diff --git a/docs/snippets/alerts/owner.mdx b/docs/snippets/alerts/owner.mdx new file mode 100644 index 000000000..7d422109b --- /dev/null +++ b/docs/snippets/alerts/owner.mdx @@ -0,0 +1,61 @@ +Elementary enriches alerts with [owners for models or tests](https://docs.getdbt.com/reference/resource-configs/meta#designate-a-model-owner)). + +- If you want the owner to be tagged on slack use '@' and the email prefix of the slack user (@jessica.jones to tag jessica.jones@marvel.com). +- You can configure a single owner or a list of owners (`["@jessica.jones", "@joe.joseph"]`). + + + +```yml model +models: + - name: my_model_name + config: + meta: + owner: "@jessica.jones" +``` + +```yml test +data_tests: + - not_null: + config: + meta: + owner: ["@jessica.jones", "@joe.joseph"] +``` + +```yml test/model config block +{{ config( + tags=["Tag1","Tag2"] + meta={ + "description": "This is a description", + "owner": "@jessica.jones" + } +) }} +``` + +```yml dbt_project.yml +models/sources: + path: + subfolder: + +meta: + owner: "@jessica.jones" + +data_tests: + path: + subfolder: + +meta: + owner: "@jessica.jones" + +# table level: + +sources: + - name: source_name + database: db + schema: schema + tables: + - name: orders + meta: + owner: "@jessica.jones" + + +``` + + \ No newline at end of file diff --git a/docs/snippets/alerts/subscribers.mdx b/docs/snippets/alerts/subscribers.mdx new file mode 100644 index 000000000..92f2c1f92 --- /dev/null +++ b/docs/snippets/alerts/subscribers.mdx @@ -0,0 +1,46 @@ +If you want additional users besides the owner to be tagged on an alert, add them as subscribers. + +- If you want the subscriber to be tagged on slack use '@' and the email prefix of the slack user (@jessica.jones to tag jessica.jones@marvel.com). +- You can configure a single subscriber or a list (`["@jessica.jones", "@joe.joseph"]`). + + + +```yml model +models: + - name: my_model_name + config: + meta: + subscribers: "@jessica.jones" +``` + +```yml test +data_tests: + - not_null: + config: + meta: + subscribers: ["@jessica.jones", "@joe.joseph"] +``` + +```yml test/model config block +{{ config( + meta={ + "subscribers": "@jessica.jones" + } +) }} +``` + +```yml dbt_project.yml +models: + path: + subfolder: + +meta: + subscribers: "@jessica.jones" + +data_tests: + path: + subfolder: + +meta: + subscribers: "@jessica.jones" +``` + + \ No newline at end of file diff --git a/docs/snippets/alerts/tags.mdx b/docs/snippets/alerts/tags.mdx new file mode 100644 index 000000000..08138bab6 --- /dev/null +++ b/docs/snippets/alerts/tags.mdx @@ -0,0 +1,39 @@ +You can use [tags](https://docs.getdbt.com/reference/resource-configs/tags) to provide context to your alerts. + +- You can tag a group or a channel in a slack alert by adding `#channel_name` as a tag. +- Tags are aggregated,so a test alert will include both the test and the parent model tags. + + + +```yml model +models: + - name: my_model_name + tags: ["#marketing", "#data_ops"] +``` + +```yml test +data_tests: + - not_null: + tags: ["#marketing", "#data_ops"] +``` + +```yml test/model config block +{{ config( + tags=["#marketing", "#data_ops"] + } +) }} +``` + +```yml dbt_project.yml +models: + path: + subfolder: + tags: ["#marketing", "#data_ops"] + +data_tests: + path: + subfolder: + tags: ["#marketing", "#data_ops"] +``` + + \ No newline at end of file diff --git a/docs/snippets/cli/athena-cli.mdx b/docs/snippets/cli/athena-cli.mdx new file mode 100644 index 000000000..1a77eace7 --- /dev/null +++ b/docs/snippets/cli/athena-cli.mdx @@ -0,0 +1,33 @@ +### Athena connection profile + +After installing Elementary's dbt package upon running `dbt deps`, +you can generate Elementary's profile for usage with `edr` by running the following command within your project: + +```shell +dbt run-operation elementary.generate_elementary_cli_profile +``` + +The command will print to the terminal a partially filled template of the profile that's needed for `edr` to work. + +```yml Athena +## ATHENA ## +## By default, edr expects the profile name 'elementary'. ## +## Configure the database and schema of elementary models. ## +## Check where 'elementary_test_results' is to find it. ## + +elementary: + outputs: + default: + type: athena + work_group: [athena workgroup] + s3_staging_dir: [s3_staging_dir] # Location to store query results & metadata + s3_data_dir: [s3 data dir] # Location to store table data (if not specified, s3_staging_dir is used) + region_name: [aws region name] # AWS region, e.g. eu-west-1 + database: [database name] + schema: [schema name] # elementary schema, usually [schema name]_elementary + threads: [number of threads like 8] +``` + +We support the same format and connection methods as dbt. Please refer to +dbt's documentation of [Athena profile](https://docs.getdbt.com/reference/warehouse-setups/athena-setup) for +further details. diff --git a/docs/_snippets/cli/bigquery-cli.mdx b/docs/snippets/cli/bigquery-cli.mdx similarity index 93% rename from docs/_snippets/cli/bigquery-cli.mdx rename to docs/snippets/cli/bigquery-cli.mdx index c48673dd5..9771ed96f 100644 --- a/docs/_snippets/cli/bigquery-cli.mdx +++ b/docs/snippets/cli/bigquery-cli.mdx @@ -1,3 +1,5 @@ +import CliServiceAccount from '/snippets/dwh/bigquery/cli_service_account.mdx'; + ### BigQuery connection profile After installing Elementary's dbt package upon running `dbt deps`, @@ -38,6 +40,8 @@ elementary: We support the same format and connection methods as dbt. Please refer to dbt's documentation of [BigQuery](https://docs.getdbt.com/reference/warehouse-setups/bigquery-setup) for further details. - + Add the full path of this JSON file to your connection profile under 'keyfile'. + + diff --git a/docs/snippets/cli/clickhouse-cli.mdx b/docs/snippets/cli/clickhouse-cli.mdx new file mode 100644 index 000000000..1348f10dc --- /dev/null +++ b/docs/snippets/cli/clickhouse-cli.mdx @@ -0,0 +1,35 @@ +### ClickHouse connection profile + +After installing Elementary's dbt package upon running `dbt deps`, +you can generate Elementary's profile for usage with `edr` by running the following command within your project: + +```shell +dbt run-operation elementary.generate_elementary_cli_profile +``` + +The command will print to the terminal a partially filled template of the profile that's needed for `edr` to work. + +```yml Clickhouse +## CLICKHOUSE ## +## By default, edr expects the profile name 'elementary'. ## +## Configure the database and schema of elementary models. ## +## Check where 'elementary_test_results' is to find it. ## + +elementary: + outputs: + default: + type: clickhouse + host: [hostname] + user: [username] + password: [password] + port: [port] + schema: [schema name] # elementary schema, usually [schema name]_elementary + threads: [1 or more] + # sslmode: [optional, set the sslmode used to connect to the database] +``` + +Note: Anomaly detection is not supported for Clickhouse. + +We support the same format and connection methods (user password, key pair authentication, SSO) as dbt. Please refer to +dbt's documentation of [Clickhouse profile](https://docs.getdbt.com/reference/warehouse-setups/clickhouse-setup) for +further details. diff --git a/docs/_snippets/cli/databricks-cli.mdx b/docs/snippets/cli/databricks-cli.mdx similarity index 100% rename from docs/_snippets/cli/databricks-cli.mdx rename to docs/snippets/cli/databricks-cli.mdx diff --git a/docs/snippets/cli/dremio-cli.mdx b/docs/snippets/cli/dremio-cli.mdx new file mode 100644 index 000000000..a34a793bb --- /dev/null +++ b/docs/snippets/cli/dremio-cli.mdx @@ -0,0 +1,64 @@ +### Dremio connection profile + +After installing Elementary's dbt package upon running `dbt deps`, +you can generate Elementary's profile for usage with `edr` by running the following command within your project: + +```shell +dbt run-operation elementary.generate_elementary_cli_profile +``` + +The command will print to the terminal a partially filled template of the profile that's needed for `edr` to work. + + + + +```yml +## DREMIO CLOUD ## +## By default, edr expects the profile name 'elementary'. ## +## Configure the database and schema of elementary models. ## +## Check where 'elementary_test_results' is to find it. ## + +elementary: + outputs: + default: + type: dremio + cloud_host: api.dremio.cloud # or api.eu.dremio.cloud for EU + cloud_project_id: [project ID] + user: [email address] + pat: [personal access token] + use_ssl: true + object_storage_source: [name] # alias: datalake + object_storage_path: [path] # alias: root_path + dremio_space: [name] # alias: database + dremio_space_folder: [path] # alias: schema, usually [schema]_elementary + threads: [1 or more] +``` + + +```yml +## DREMIO SOFTWARE ## +## By default, edr expects the profile name 'elementary'. ## +## Configure the database and schema of elementary models. ## +## Check where 'elementary_test_results' is to find it. ## + +elementary: + outputs: + default: + type: dremio + software_host: [hostname or IP address] + port: 9047 + user: [username] + password: [password] # or use pat: [personal access token] + use_ssl: [true or false] + object_storage_source: [name] # alias: datalake + object_storage_path: [path] # alias: root_path + dremio_space: [name] # alias: database + dremio_space_folder: [path] # alias: schema, usually [schema]_elementary + threads: [1 or more] +``` + + + +We support the same format and connection methods as dbt. Please refer to +dbt's documentation of [Dremio profile](https://docs.getdbt.com/docs/core/connect-data-platform/dremio-setup) for +further details. diff --git a/docs/snippets/cli/duckdb-cli.mdx b/docs/snippets/cli/duckdb-cli.mdx new file mode 100644 index 000000000..0db80b0be --- /dev/null +++ b/docs/snippets/cli/duckdb-cli.mdx @@ -0,0 +1,29 @@ +### DuckDB connection profile + +After installing Elementary's dbt package upon running `dbt deps`, +you can generate Elementary's profile for usage with `edr` by running the following command within your project: + +```shell +dbt run-operation elementary.generate_elementary_cli_profile +``` + +The command will print to the terminal a partially filled template of the profile that's needed for `edr` to work. + +```yml DuckDB +## DUCKDB ## +## By default, edr expects the profile name 'elementary'. ## +## Configure the database and schema of elementary models. ## +## Check where 'elementary_test_results' is to find it. ## + +elementary: + outputs: + default: + type: duckdb + path: [path to your .duckdb file] + schema: [schema name] # elementary schema, usually [schema name]_elementary + threads: [1 or more] +``` + +We support the same format and connection methods as dbt. Please refer to +dbt's documentation of [DuckDB profile](https://docs.getdbt.com/reference/warehouse-setups/duckdb-setup) for +further details. diff --git a/docs/snippets/cli/fabric-cli.mdx b/docs/snippets/cli/fabric-cli.mdx new file mode 100644 index 000000000..69bb48817 --- /dev/null +++ b/docs/snippets/cli/fabric-cli.mdx @@ -0,0 +1,36 @@ +### Fabric connection profile + +After installing Elementary's dbt package upon running `dbt deps`, +you can generate Elementary's profile for usage with `edr` by running the following command within your project: + +```shell +dbt run-operation elementary.generate_elementary_cli_profile +``` + +The command will print to the terminal a partially filled template of the profile that's needed for `edr` to work. + +```yml Fabric +## FABRIC ## +## By default, edr expects the profile name 'elementary'. ## +## Configure the database and schema of elementary models. ## +## Check where 'elementary_test_results' is to find it. ## + +elementary: + outputs: + default: + type: fabric + driver: ODBC Driver 18 for SQL Server + server: [hostname] + port: 1433 + database: [database name] + schema: [schema name] # elementary schema, usually [schema name]_elementary + authentication: ActiveDirectoryServicePrincipal + tenant_id: [tenant_id] + client_id: [client_id] + client_secret: [client_secret] + threads: [1 or more] +``` + +We support the same format and connection methods as dbt. Please refer to +dbt's documentation of [Fabric profile](https://docs.getdbt.com/docs/core/connect-data-platform/fabric-setup) for +further details. diff --git a/docs/_snippets/cli/postgres-cli.mdx b/docs/snippets/cli/postgres-cli.mdx similarity index 100% rename from docs/_snippets/cli/postgres-cli.mdx rename to docs/snippets/cli/postgres-cli.mdx diff --git a/docs/_snippets/cli/redshift-cli.mdx b/docs/snippets/cli/redshift-cli.mdx similarity index 100% rename from docs/_snippets/cli/redshift-cli.mdx rename to docs/snippets/cli/redshift-cli.mdx diff --git a/docs/_snippets/cli/snowflake-cli.mdx b/docs/snippets/cli/snowflake-cli.mdx similarity index 100% rename from docs/_snippets/cli/snowflake-cli.mdx rename to docs/snippets/cli/snowflake-cli.mdx diff --git a/docs/snippets/cli/spark-cli.mdx b/docs/snippets/cli/spark-cli.mdx new file mode 100644 index 000000000..af871ce9c --- /dev/null +++ b/docs/snippets/cli/spark-cli.mdx @@ -0,0 +1,33 @@ +### Spark connection profile + +After installing Elementary's dbt package upon running `dbt deps`, +you can generate Elementary's profile for usage with `edr` by running the following command within your project: + +```shell +dbt run-operation elementary.generate_elementary_cli_profile +``` + +The command will print to the terminal a partially filled template of the profile that's needed for `edr` to work. + +```yml Spark +## SPARK ## +## By default, edr expects the profile name 'elementary'. ## +## Configure the database and schema of elementary models. ## +## Check where 'elementary_test_results' is to find it. ## + +elementary: + outputs: + default: + type: spark + method: [thrift, http, or odbc] + host: [hostname] + port: [port] + user: [username] + schema: [schema name] # elementary schema, usually [schema name]_elementary + threads: [1 or more] + # token: [optional, used with http method] +``` + +We support the same format and connection methods as dbt. Please refer to +dbt's documentation of [Spark profile](https://docs.getdbt.com/reference/warehouse-setups/spark-setup) for +further details. diff --git a/docs/snippets/cli/sqlserver-cli.mdx b/docs/snippets/cli/sqlserver-cli.mdx new file mode 100644 index 000000000..502bfbe1d --- /dev/null +++ b/docs/snippets/cli/sqlserver-cli.mdx @@ -0,0 +1,36 @@ +### SQL Server connection profile + +After installing Elementary's dbt package upon running `dbt deps`, +you can generate Elementary's profile for usage with `edr` by running the following command within your project: + +```shell +dbt run-operation elementary.generate_elementary_cli_profile +``` + +The command will print to the terminal a partially filled template of the profile that's needed for `edr` to work. + +```yml SQL Server +## SQL SERVER ## +## By default, edr expects the profile name 'elementary'. ## +## Configure the database and schema of elementary models. ## +## Check where 'elementary_test_results' is to find it. ## + +elementary: + outputs: + default: + type: sqlserver + driver: ODBC Driver 18 for SQL Server + server: [hostname] + port: 1433 + database: [database name] + schema: [schema name] # elementary schema, usually [schema name]_elementary + user: [username] + password: [password] + threads: [1 or more] + # encrypt: true # default true in dbt-sqlserver >= 1.2.0 + # trust_cert: false +``` + +We support the same format and connection methods as dbt. Please refer to +dbt's documentation of [SQL Server profile](https://docs.getdbt.com/docs/core/connect-data-platform/mssql-setup) for +further details. diff --git a/docs/snippets/cli/trino-cli.mdx b/docs/snippets/cli/trino-cli.mdx new file mode 100644 index 000000000..82d9da22a --- /dev/null +++ b/docs/snippets/cli/trino-cli.mdx @@ -0,0 +1,35 @@ +### Trino connection profile + +After installing Elementary's dbt package upon running `dbt deps`, +you can generate Elementary's profile for usage with `edr` by running the following command within your project: + +```shell +dbt run-operation elementary.generate_elementary_cli_profile +``` + +The command will print to the terminal a partially filled template of the profile that's needed for `edr` to work. + +```yml Trino +## TRINO ## +## By default, edr expects the profile name 'elementary'. ## +## Configure the database and schema of elementary models. ## +## Check where 'elementary_test_results' is to find it. ## + +elementary: + outputs: + default: + type: trino + host: [hostname] + port: [port] + database: [database name] + schema: [schema name] # elementary schema, usually [schema name]_elementary + threads: [1 or more] + method: [authentication method] # ldap, oauth etc. + user: [username] + # password: [optional, used with ldap authentication ] + # session_properties: [optional, sets Trino session properties used in the connection] +``` + +We support the same format and connection methods as dbt. Please refer to +dbt's documentation of [Trino profile](https://docs.getdbt.com/reference/warehouse-setups/trino-setup) for +further details. diff --git a/docs/snippets/cli/vertica-cli.mdx b/docs/snippets/cli/vertica-cli.mdx new file mode 100644 index 000000000..7f4b0104a --- /dev/null +++ b/docs/snippets/cli/vertica-cli.mdx @@ -0,0 +1,35 @@ +### Vertica connection profile + +After installing Elementary's dbt package upon running `dbt deps`, +you can generate Elementary's profile for usage with `edr` by running the following command within your project: + +```shell +dbt run-operation elementary.generate_elementary_cli_profile +``` + +The command will print to the terminal a partially filled template of the profile that's needed for `edr` to work. + +```yml Vertica +## VERTICA ## +## By default, edr expects the profile name 'elementary'. ## +## Configure the database and schema of elementary models. ## +## Check where 'elementary_test_results' is to find it. ## + +elementary: + outputs: + default: + type: vertica + host: [hostname] + port: 5433 + database: [database name] + schema: [schema name] # elementary schema, usually [schema name]_elementary + username: [username] + password: [password] + threads: [1 or more] + # connection_load_balance: true + # backup_server_node: [comma separated list of backup hostnames or IPs] +``` + +We support the same format and connection methods as dbt. Please refer to +dbt's documentation of [Vertica profile](https://docs.getdbt.com/reference/warehouse-setups/vertica-setup) for +further details. diff --git a/docs/snippets/cloud/ai-privacy-policy-short.mdx b/docs/snippets/cloud/ai-privacy-policy-short.mdx new file mode 100644 index 000000000..0e761fb9f --- /dev/null +++ b/docs/snippets/cloud/ai-privacy-policy-short.mdx @@ -0,0 +1,20 @@ + +### Privacy & Data Use + +Elementary’s AI agents are designed for secure, transparent operation, with full user control at every step. All processing runs through self-hosted models on **Amazon Bedrock**, ensuring that **no data is shared with external services**, and nothing is stored or used for model training. + +Everything is **opt-in** and **disabled by default**. You control which AI features are enabled, and every action an agent takes—such as opening a pull request—**requires your explicit approval**. + +Each agent may use the following data to provide recommendations: + +- **Metadata** – asset names, folders, column names, types, tags, descriptions, owners, and custom fields +- **SQL queries** – to analyze logic, infer lineage, or suggest optimizations +- **Test results and incidents** – including failed assertions and alert metadata +- **Historical execution data** – query runtimes, update frequency, freshness, and resource usage +- **User-defined policies** *(if configured)* – for testing, governance, or asset configuration +- **Commit and pull request history** *(when applicable)* – to connect changes with incidents +- **Chat input** *(if applicable)* – prompts used during the session only, not stored or logged + +Elementary does **not collect or share any sensitive data**, and you can review and align our AI use with your organization’s governance policies at any time. + +For full details, see our [AI Privacy Policy](/cloud/general/ai-privacy-policy). \ No newline at end of file diff --git a/docs/_snippets/cloud/cloud-feature-tag.mdx b/docs/snippets/cloud/cloud-feature-tag.mdx similarity index 75% rename from docs/_snippets/cloud/cloud-feature-tag.mdx rename to docs/snippets/cloud/cloud-feature-tag.mdx index 48d2e15e4..4fc9ec646 100644 --- a/docs/_snippets/cloud/cloud-feature-tag.mdx +++ b/docs/snippets/cloud/cloud-feature-tag.mdx @@ -1,8 +1,8 @@ {/* prettier-ignore */} ELEMENTARY CLOUD diff --git a/docs/_snippets/cloud/features.mdx b/docs/snippets/cloud/features.mdx similarity index 70% rename from docs/_snippets/cloud/features.mdx rename to docs/snippets/cloud/features.mdx index 4d4442eef..2ca0a7501 100644 --- a/docs/_snippets/cloud/features.mdx +++ b/docs/snippets/cloud/features.mdx @@ -7,21 +7,21 @@ including both pipeline and data monitoring, validation tests, anomaly detection for unexpected behavior, and a single interface to manage it all at scale. - + ML-powered monitors automatically detect data quality issues. Out-of-the-box for volume and freshness, and opt-in for data quality metrics. - + Validate data and track the results of dbt tests, dbt packages tests (dbt-utils, dbt-expectations, elementary) and custom SQL tests. - + Validate there are no breaking changes in tables schema, JSON schema, and downstream exposures such as dashboards. - + Track failures and runs of jobs, models, and tests overtime. Pipeline failures and performance issues can cause data incidents, and create unneceserry costs. - + Configure Elementary in code, or via the UI for non-technical users or for adding tests in bulk. The platform opens PRs to your repo, saving hours of tedious YAML edits. @@ -37,21 +37,21 @@ Elementary offers tools to create an effective response plan, for faster recover This includes investigating the root cause and impact of issues, communicating issues to the relevant people, assigning owners to fix issues, keeping track of open incidents and more. - + Column-level lineage that spans through sources, models and BI tools, enriched with monitoring results. Enables granular root cause and impact analysis. - + Define clear ownership of data assets and enable owners to be informed and accountable for the health and status of their data. - + Distribute highly configurable alerts to different channels and integrations. Automatically tag owners, and enable setting status and assigns at the alert level. - + Different failures related to the same issue are grouped automatically to a single incident. This accelerates triage and response, and reduces alerts fautigue. - + Manage all open incidents in a single interface, with a clear view of status and assignees. Track historical incidents and high-level incidents metrics. @@ -65,17 +65,17 @@ Elementary fosters collaboration by allowing you to easily share and communicate the overall health of the data platform and progress made to improve it with the broader organization. - + Up to date dashboard with current status and trends of data issues. Share the dashboard with others, enable them to slice results and stay informed. - + Enable effective collaboration and communication by grouping related data assets and tests by business domains, data products, priority, etc. - + Search and explore your datasets information - descriptions, columns, column descriptions, compiled code, datasets health and more. - - Coming soon! + + See the Data Health scores of all your datasets by domain and share with stakeholders. \ No newline at end of file diff --git a/docs/_snippets/cloud/features/alerts-and-incidents/alert-types.mdx b/docs/snippets/cloud/features/alerts-and-incidents/alert-types.mdx similarity index 100% rename from docs/_snippets/cloud/features/alerts-and-incidents/alert-types.mdx rename to docs/snippets/cloud/features/alerts-and-incidents/alert-types.mdx diff --git a/docs/snippets/cloud/features/anomaly-detection/all-anomalies-configuration.mdx b/docs/snippets/cloud/features/anomaly-detection/all-anomalies-configuration.mdx new file mode 100644 index 000000000..b4b5053d5 --- /dev/null +++ b/docs/snippets/cloud/features/anomaly-detection/all-anomalies-configuration.mdx @@ -0,0 +1,2 @@ +- **Severity** - Should a failure be considered a warning or a failure. Default is warning. +- **Test metadata** - Add metadata such as tags and owner to the test. \ No newline at end of file diff --git a/docs/_snippets/cloud/features/anomaly-detection/automated-monitors-cards.mdx b/docs/snippets/cloud/features/anomaly-detection/automated-monitors-cards.mdx similarity index 61% rename from docs/_snippets/cloud/features/anomaly-detection/automated-monitors-cards.mdx rename to docs/snippets/cloud/features/anomaly-detection/automated-monitors-cards.mdx index 578c45146..633563142 100644 --- a/docs/_snippets/cloud/features/anomaly-detection/automated-monitors-cards.mdx +++ b/docs/snippets/cloud/features/anomaly-detection/automated-monitors-cards.mdx @@ -1,9 +1,9 @@ - + Monitors updates to tables and how frequently a table is updated, and fails if there is an unexpected delay. - + Monitors how many rows were added or removed to a table on each update, and fails if there is an unexpected drop or spike in rows. diff --git a/docs/_snippets/cloud/features/anomaly-detection/automated-monitors-intro.mdx b/docs/snippets/cloud/features/anomaly-detection/automated-monitors-intro.mdx similarity index 100% rename from docs/_snippets/cloud/features/anomaly-detection/automated-monitors-intro.mdx rename to docs/snippets/cloud/features/anomaly-detection/automated-monitors-intro.mdx diff --git a/docs/snippets/cloud/features/anomaly-detection/freshness-configuration.mdx b/docs/snippets/cloud/features/anomaly-detection/freshness-configuration.mdx new file mode 100644 index 000000000..51daa1354 --- /dev/null +++ b/docs/snippets/cloud/features/anomaly-detection/freshness-configuration.mdx @@ -0,0 +1,6 @@ +You can choose between 2 detection methods for the Freshness monitor- Automatic and Manual. +- **Automatic** - Elementary uses machine learning models to detect anomalies in the data freshness. This is the default setting. You can change the sensitivity level to *Low*, *Medium*, or *High*. +For each level, you will see a simulation of the change impact on the latest result, and you can use the`Simulate Configuration` button to examine the change impact. +- **Manual** - You can set the SLA breach threshold for the freshness monitor manually. This is useful for assets that are updated regularly at the same time every day, hour or week. + +Freshness monitor configuration \ No newline at end of file diff --git a/docs/snippets/cloud/features/anomaly-detection/volume-configuration.mdx b/docs/snippets/cloud/features/anomaly-detection/volume-configuration.mdx new file mode 100644 index 000000000..13729e492 --- /dev/null +++ b/docs/snippets/cloud/features/anomaly-detection/volume-configuration.mdx @@ -0,0 +1,3 @@ +- **Anomaly Direction** - Whether you want the monitor to fail on anomalous drops, spikes, or both. Default is both. +- **Sensitivity** - You can set the monitor's sensitivity levels to *Low*, *Medium*, or *High*. In the future, we plan to allow for more nuanced adjustments to this parameter. You can use the `Simulate Configuration` button for testing how the change will affect the monitor. +- **Detection Period** - The period in which the monitor look for anomalies. Default is the last 2 days. \ No newline at end of file diff --git a/docs/snippets/cloud/features/data-health/data-health-intro.mdx b/docs/snippets/cloud/features/data-health/data-health-intro.mdx new file mode 100644 index 000000000..4e4bdb24f --- /dev/null +++ b/docs/snippets/cloud/features/data-health/data-health-intro.mdx @@ -0,0 +1,11 @@ +Once you start sharing data with downstream consumers and stakeholders one of the most important things that you want to create is trust. +Trust that the data that is being used is “healthy”. Imagine being a data analyst using a specific data asset but you constantly run into data quality issues. +You will eventually lose trust. + +This is why we created **data health scores** in Elementary. It is a way to share an overview of the health of your data assets. + +To measure health we use an industry standard framework of [Data Quality Dimensions](/cloud/features/collaboration-and-communication/data-quality-dimensions#data-quality-dimensions). +These dimensions help assess the reliability of data in various business contexts. +Ensuring high-quality data across these dimensions is critical for accurate analysis, informed decision-making, and operational efficiency. + +To learn more, **watch the webinar** [**Measuring Data Health with Elementary**](https://www.elementary-data.com/webinar/measuring-data-health-with-elementary) diff --git a/docs/snippets/cloud/features/data-health/data-quality-dimensions.mdx b/docs/snippets/cloud/features/data-health/data-quality-dimensions.mdx new file mode 100644 index 000000000..47577b8d8 --- /dev/null +++ b/docs/snippets/cloud/features/data-health/data-quality-dimensions.mdx @@ -0,0 +1,20 @@ + + + Ensures that data is up to date and reflects the latest information. + + + Ensures all required data is available, without missing values. + + + Ensures that data represents the real-world scenario correctly. + + + The degree to which data remains uniform across multiple instances. + + + Ensures that each entity is represented only once and there are no duplicates. + + + Ensures that data conforms to rules or expectations, such as acceptable ranges or formats. + + \ No newline at end of file diff --git a/docs/snippets/cloud/features/data-tests/benefits-dbt-tests.mdx b/docs/snippets/cloud/features/data-tests/benefits-dbt-tests.mdx new file mode 100644 index 000000000..c44bafe6b --- /dev/null +++ b/docs/snippets/cloud/features/data-tests/benefits-dbt-tests.mdx @@ -0,0 +1,10 @@ +dbt tests are very powerful. The ease of use, simplicity, and usefulness in the dev process is unmatched. +When you adopt any observability tool, you will still use dbt tests. This is why in Elementary, dbt tests are first class citizens. + +There are several benefits to this approach: + +- **Single interface for all observability** - Prevent the distribution of monitoring between different tools. All configuration is in code, all the results are in one interface. +- **Avoid duplicate work and vendor lock in** - The tests you implemented already are effective in Elementary, as well as additional configuration. The future tests you add will remain in your code if you decide to offboard. +- **Control of schedule and cost** - You have control of configuration and scheduling, tests can be executed when data is actually loaded and validation is needed. +- **Prevent bad data from propagating** - As tests are in pipeline, you can leverage `dbt build` and fail the pipeline on critical test failures. +- **Rich ecosystem** - The community of dbt users developes and supports various testing use cases. \ No newline at end of file diff --git a/docs/snippets/cloud/features/data-tests/data-tests-cards.mdx b/docs/snippets/cloud/features/data-tests/data-tests-cards.mdx new file mode 100644 index 000000000..dd164c068 --- /dev/null +++ b/docs/snippets/cloud/features/data-tests/data-tests-cards.mdx @@ -0,0 +1,14 @@ + + + Native dbt tests such as `not_null`, `unique`, etc. + + + Tests of packages such as `dbt-expectations`, `dbt-utils`, etc. + + + Tests to validate an explicit business logic. + + + Schema tests by Elementary, implemented as dbt tests. + + \ No newline at end of file diff --git a/docs/snippets/cloud/features/data-tests/dbt-test-hub.mdx b/docs/snippets/cloud/features/data-tests/dbt-test-hub.mdx new file mode 100644 index 000000000..412e1c700 --- /dev/null +++ b/docs/snippets/cloud/features/data-tests/dbt-test-hub.mdx @@ -0,0 +1,4 @@ +To help you find the test that is right for your use case, we created the [dbt Test Hub](https://www.elementary-data.com/dbt-test-hub). +It's a searchable catalog of all the tests supported in Elementary, with their descriptions and example use cases. + +The tests are also segmented to use cases, so you can easily find the different options for addressing your detection use case. \ No newline at end of file diff --git a/docs/_snippets/cloud/how-it-works.mdx b/docs/snippets/cloud/how-it-works.mdx similarity index 100% rename from docs/_snippets/cloud/how-it-works.mdx rename to docs/snippets/cloud/how-it-works.mdx diff --git a/docs/snippets/cloud/integrations/athena.mdx b/docs/snippets/cloud/integrations/athena.mdx new file mode 100644 index 000000000..c6413242b --- /dev/null +++ b/docs/snippets/cloud/integrations/athena.mdx @@ -0,0 +1,135 @@ +import CreateUserOperation from '/snippets/cloud/integrations/create-user-operation.mdx'; + +This guide contains the necessary steps to connect an Athena environment to your Elementary account. + + + +## AWS Setup + +### 1. Create Required IAM Policy + +First, you'll need to create an IAM policy with the following permissions: +- **AthenaPermissions**: Allows executing and retrieving query results from Athena +- **GluePermissions**: Enables reading metadata about databases and tables +- **S3AccessForStagingBuckets**: Provides full access to store Athena query results +- **S3AccessForElementarySchema**: Grants read-only access to your elementary schema + +Here is an example of a JSON policy: +```json +{ + "Version": "2012-10-17", + "Statement": [ + { + "Sid": "AthenaPermissions", + "Effect": "Allow", + "Action": [ + "athena:StartQueryExecution", + "athena:GetQueryExecution", + "athena:GetQueryResults" + ], + "Resource": "*" + }, + { + "Sid": "GluePermissions", + "Effect": "Allow", + "Action": [ + "glue:GetDatabase", + "glue:GetDatabases", + "glue:GetTable", + "glue:GetTables", + "glue:GetTableVersions", + "glue:GetPartition", + "glue:GetPartitions" + ], + "Resource": "*" + }, + { + "Sid": "S3AccessForStagingBuckets", + "Effect": "Allow", + "Action": [ + "s3:GetObject", + "s3:PutObject", + "s3:DeleteObject", + "s3:ListBucket", + "s3:GetBucketLocation" + ], + "Resource": [ + "arn:aws:s3:::your-query-results-bucket", + "arn:aws:s3:::your-query-results-bucket/*" + ] + }, + { + "Sid": "S3AccessForElementarySchema", + "Effect": "Allow", + "Action": [ + "s3:GetObject", + "s3:ListBucket" + ], + "Resource": [ + "arn:aws:s3:::your-elementary-schema-bucket", + "arn:aws:s3:::your-elementary-schema-bucket/*" + ] + } + ] +} +``` + +### 2. Choose Authentication Method + +Elementary supports two authentication methods for connecting to Athena: + +#### Option 1: AWS Role Authentication (Recommended) + +This is the recommended approach as it provides better security and follows AWS best practices. [Learn more about AWS IAM roles](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html). + +1. **Create an IAM Role**: + - Go to AWS IAM Console + - Create a new role + - Select "Another AWS account" as the trusted entity + - Enter Elementary's AWS account ID: `743289191656` + - (Optional but recommended) Enable "Require external ID" and set a value + - Attach the policy created in step 1 + +2. **Note down the following information**: + - Role ARN + - External ID (if you enabled it) [Learn more about external IDs](https://aws.amazon.com/blogs/security/how-to-use-external-id-when-granting-access-to-your-aws-resources/). + +#### Option 2: Access Key Authentication + +This method is less secure as it requires permanent credentials. We recommend using AWS Role authentication instead. + +1. **Create an IAM User**: + - Go to AWS IAM Console + - Create a new user, that will be used by elementary to query athena + - Enable programmatic access + - Attach the policy created in step 1 + +2. **Note down the following information**: + - AWS Access Key ID of the new elementary athena user + - AWS Secret Access Key of the new elementary athena user + +## Elementary Configuration + +### Connection Settings + +Regardless of the authentication method you choose, you'll need to provide: + +- **Region**: The AWS region where your Athena instance is located +- **Database**: The name of the database where your Elementary schema exist. +- **Schema**: The name of your Elementary schema. Usually [schema name]_elementary +- **S3 Staging Directory**: The S3 path where Athena query results will be stored +- **Workgroup**: (Optional) Your Athena workgroup name + +### Authentication Details + +Based on your chosen authentication method: + +#### If using AWS Role Authentication: +- Select "AWS Role" as the authentication method +- Enter your role ARN +- Enter your external ID (if you enabled it) + +#### If using Access Key Authentication: +- Select "Access Key" as the authentication method +- Enter your AWS Access Key ID +- Enter your AWS Secret Access Key diff --git a/docs/_snippets/cloud/integrations/bigquery.mdx b/docs/snippets/cloud/integrations/bigquery.mdx similarity index 51% rename from docs/_snippets/cloud/integrations/bigquery.mdx rename to docs/snippets/cloud/integrations/bigquery.mdx index 8e35e235e..c69b55239 100644 --- a/docs/_snippets/cloud/integrations/bigquery.mdx +++ b/docs/snippets/cloud/integrations/bigquery.mdx @@ -1,8 +1,12 @@ -You will connect Elementary Cloud to Bigquery for syncing the Elementary schema (created by the [Elementary dbt package](/cloud/onboarding/quickstart-dbt-package)). +import CloudServiceAccount from '/snippets/dwh/bigquery/cloud_service_account.mdx'; +import PermissionsAndSecurity from '/snippets/cloud/integrations/permissions-and-security.mdx'; +import IpAllowlist from '/snippets/cloud/integrations/ip-allowlist.mdx'; - +This guide contains the necessary steps to connect a BigQuery environment to your Elementary account. - + + + ### Fill the connection form @@ -12,5 +16,6 @@ Provide the following fields: - **Project**: The name of your BigQuery project. - **Elementary dataset**: The name of your Elementary dataset. Usually `[dataset name]_elementary`. - **Location**: Use this field to configure the location of BigQuery datasets as per [the BigQuery documentation](https://cloud.google.com/bigquery/docs/locations). +- **Workload Identity Federation**: Support for Workload Identity Federation with BigQuery service accounts is coming soon - + diff --git a/docs/_snippets/cloud/integrations/cards-groups/alerts-destination-cards.mdx b/docs/snippets/cloud/integrations/cards-groups/alerts-destination-cards.mdx similarity index 50% rename from docs/_snippets/cloud/integrations/cards-groups/alerts-destination-cards.mdx rename to docs/snippets/cloud/integrations/cards-groups/alerts-destination-cards.mdx index 1c369051d..ba6909bca 100644 --- a/docs/_snippets/cloud/integrations/cards-groups/alerts-destination-cards.mdx +++ b/docs/snippets/cloud/integrations/cards-groups/alerts-destination-cards.mdx @@ -61,36 +61,48 @@ + + - - + } > - Click for details + } + > + + - + + + } > - Click for details } - > + /> + } - > - + /> - - + + + + + } - > - Click for details - + /> + - - + } - > - Click for details - + /> - + + } + /> + + + + + + + } - > - Click for details - + /> - + + + } - > - Click for details - + /> @@ -138,8 +146,8 @@ Click for details @@ -148,6 +156,7 @@ > Click for details + + /> diff --git a/docs/snippets/cloud/integrations/cards-groups/cloud-integrations-cards.mdx b/docs/snippets/cloud/integrations/cards-groups/cloud-integrations-cards.mdx new file mode 100644 index 000000000..d6a35f7e9 --- /dev/null +++ b/docs/snippets/cloud/integrations/cards-groups/cloud-integrations-cards.mdx @@ -0,0 +1,45 @@ +import ConnectDwhCards from '/snippets/cloud/integrations/cards-groups/connect-dwh-cards.mdx'; +import TransformationAndOrchestrationCards from '/snippets/cloud/integrations/cards-groups/transformation-and-orchestration-cards.mdx'; +import BiCards from '/snippets/cloud/integrations/cards-groups/bi-cards.mdx'; +import ReverseEtlCards from '/snippets/cloud/integrations/cards-groups/reverse-etl-cards.mdx'; +import CodeRepoCards from '/snippets/cloud/integrations/cards-groups/code-repo-cards.mdx'; +import AlertsDestinationCards from '/snippets/cloud/integrations/cards-groups/alerts-destination-cards.mdx'; +import LogStreamingCards from '/snippets/cloud/integrations/cards-groups/log-streaming-cards.mdx'; +import GovernanceCards from '/snippets/cloud/integrations/cards-groups/governance-cards.mdx'; +import MetadataLayerCards from '/snippets/cloud/integrations/cards-groups/metadata-layer-cards.mdx'; + +### Data warehouses + + + +### Transformation and orchestration + + + +### Data visualization + + + +### Reverse ETL + + + +### Code repositories + + + +### Alerts & incidents + + + +### Log Streaming + + + +### Governance + + + +### Iceberg catalog + + diff --git a/docs/_snippets/cloud/integrations/cards-groups/code-repo-cards.mdx b/docs/snippets/cloud/integrations/cards-groups/code-repo-cards.mdx similarity index 66% rename from docs/_snippets/cloud/integrations/cards-groups/code-repo-cards.mdx rename to docs/snippets/cloud/integrations/cards-groups/code-repo-cards.mdx index bcc19ad1f..9d9cfc5fa 100644 --- a/docs/_snippets/cloud/integrations/cards-groups/code-repo-cards.mdx +++ b/docs/snippets/cloud/integrations/cards-groups/code-repo-cards.mdx @@ -23,8 +23,8 @@ href="/cloud/integrations/code-repo/gitlab" icon={ } > + + + } + > + + + + + + + + + + + + + } + > - + \ No newline at end of file diff --git a/docs/_snippets/oss/adapters-cards.mdx b/docs/snippets/cloud/integrations/cards-groups/connect-dwh-cards.mdx similarity index 52% rename from docs/_snippets/oss/adapters-cards.mdx rename to docs/snippets/cloud/integrations/cards-groups/connect-dwh-cards.mdx index 44ceba5f9..9c3326c5d 100644 --- a/docs/_snippets/oss/adapters-cards.mdx +++ b/docs/snippets/cloud/integrations/cards-groups/connect-dwh-cards.mdx @@ -1,6 +1,7 @@ } > - - - - - - - - - - - - - - - - - - - - - - + + + + + + + + + + + + + + + + } - > - + + - + + + + + + + + + } > + } + > + } > + + + + + + } + > + + + + + + } + > + + + + + + + + + + + + + + + + + + + + + + + } + > + + + + } + > + + + + + + + + + + + } + > + + + + + + + } + > + Click for details + + + \ No newline at end of file diff --git a/docs/snippets/cloud/integrations/cards-groups/log-streaming-cards.mdx b/docs/snippets/cloud/integrations/cards-groups/log-streaming-cards.mdx new file mode 100644 index 000000000..5e8bac8ff --- /dev/null +++ b/docs/snippets/cloud/integrations/cards-groups/log-streaming-cards.mdx @@ -0,0 +1,38 @@ + + + + + + + + } + > + + + + + } + > + + + + + + + + } + > + + diff --git a/docs/snippets/cloud/integrations/cards-groups/metadata-layer-cards.mdx b/docs/snippets/cloud/integrations/cards-groups/metadata-layer-cards.mdx new file mode 100644 index 000000000..c99de2f9a --- /dev/null +++ b/docs/snippets/cloud/integrations/cards-groups/metadata-layer-cards.mdx @@ -0,0 +1,136 @@ + + + + + + + + + + + + + + + + + } + > + Click for details + + + \ No newline at end of file diff --git a/docs/_snippets/cloud/integrations/cards-groups/reverse-etl-cards.mdx b/docs/snippets/cloud/integrations/cards-groups/reverse-etl-cards.mdx similarity index 99% rename from docs/_snippets/cloud/integrations/cards-groups/reverse-etl-cards.mdx rename to docs/snippets/cloud/integrations/cards-groups/reverse-etl-cards.mdx index 16d7cbec6..7681998d0 100644 --- a/docs/_snippets/cloud/integrations/cards-groups/reverse-etl-cards.mdx +++ b/docs/snippets/cloud/integrations/cards-groups/reverse-etl-cards.mdx @@ -23,7 +23,7 @@ } > + + + + } + > + +### Fill the connection form + +Provide the following fields: + +- **Host**: The hostname of your Clickhouse account to connect to. This can either be an IP address or a hostname. +- **Port**: The port of your Clickhouse account to connect to. This is usually `8123`. +- **Elementary schema**: The name of your Elementary schema. Usually `[schema name]_elementary`. +- **User**: The name of the for Elementary user. +- **Password**: The password associated with the provided user. + + + + +### Connect via SSH tunnel + +Elementary supports connecting via SSH or reverse SSH tunnel. Reach out to our team for details and support in this deployment. \ No newline at end of file diff --git a/docs/snippets/cloud/integrations/create-user-operation-snowflake.mdx b/docs/snippets/cloud/integrations/create-user-operation-snowflake.mdx new file mode 100644 index 000000000..2bf22da30 --- /dev/null +++ b/docs/snippets/cloud/integrations/create-user-operation-snowflake.mdx @@ -0,0 +1,19 @@ +import PermissionsAndSecurity from '/snippets/cloud/integrations/permissions-and-security.mdx'; + +### Create a user for Elementary cloud + +* Please create a Snowflake key-pair (private and public key) using [this](https://docs.snowflake.com/en/user-guide/key-pair-auth#configuring-key-pair-authentication) guide. + +* Using the public key generated in the previous step, please run the following in your dbt project folder: + +```bash +## Store the public key in an environment variable +export SNOWFLAKE_PUBLIC_KEY="" + +## Print the query you should run to generate a user +dbt run-operation create_elementary_user --args "{'public_key': '$SNOWFLAKE_PUBLIC_KEY'}" +``` + +This command will generate a query to create a user with the necessary permissions. Run this query on your data warehouse with **admin permissions** to create the user. + + diff --git a/docs/_snippets/cloud/integrations/create-user-operation.mdx b/docs/snippets/cloud/integrations/create-user-operation.mdx similarity index 73% rename from docs/_snippets/cloud/integrations/create-user-operation.mdx rename to docs/snippets/cloud/integrations/create-user-operation.mdx index bbc685a73..1d60ba049 100644 --- a/docs/_snippets/cloud/integrations/create-user-operation.mdx +++ b/docs/snippets/cloud/integrations/create-user-operation.mdx @@ -1,3 +1,5 @@ +import PermissionsAndSecurity from '/snippets/cloud/integrations/permissions-and-security.mdx'; + ### Create a user for Elementary cloud On your dbt project, run: @@ -9,4 +11,4 @@ dbt run-operation create_elementary_user This command will generate a query to create a user with the necessary permissions. Run this query on your data warehouse with **admin permissions** to create the user. - + diff --git a/docs/snippets/cloud/integrations/databricks.mdx b/docs/snippets/cloud/integrations/databricks.mdx new file mode 100644 index 000000000..25b49138f --- /dev/null +++ b/docs/snippets/cloud/integrations/databricks.mdx @@ -0,0 +1,137 @@ +import CreateServicePrincipal from '/snippets/dwh/databricks/create_service_principal.mdx'; +import PermissionsAndSecurity from '/snippets/dwh/databricks/databricks_permissions_and_security.mdx'; +import IpAllowlist from '/snippets/cloud/integrations/ip-allowlist.mdx'; + +This guide contains the necessary steps to connect a Databricks environment to your Elementary account. + + + + + +### Add an environment in Elementary (requires an admin user) + +In the Elementary platform, go to Environments in the left menu, and click on the "Create Environment" button. +Choose a name for your environment, and then choose Databricks as your data warehouse type. + +Provide the following common fields in the form: + +- **Server Host**: The hostname of your Databricks account to connect to. +- **Http path**: The path to the Databricks cluster or SQL warehouse. +- **Catalog (optional)**: The name of the Databricks Catalog. +- **Elementary schema**: The name of your Elementary schema. Usually `[your dbt target schema]_elementary`. + +Then, select your authentication method: + +#### OAuth (M2M) — Recommended + +Authenticate with M2M OAuth + +- **Client ID**: The Application (client) ID of the service principal (the "Application ID" you copied in [step 5](#create-service-principal)). +- **Client secret**: The OAuth secret you generated for the service principal (see [step 7](#create-service-principal)). + + + OAuth machine-to-machine (M2M) authentication is the recommended method for connecting to Databricks. + It uses short-lived tokens that are automatically refreshed, providing better security compared to + long-lived personal access tokens. + + +### Storage Access + +Elementary requires access to the table history in order to enable automated monitors such as volume and freshness monitors. +You can configure this in one of the following ways: + +#### Option 1: Fetch history using `DESCRIBE HISTORY` + +Elementary can fetch the table history by running `DESCRIBE HISTORY` queries on your Databricks warehouse. +In the Elementary UI, choose **None** under **Storage access method**. + +This requires `SELECT` access on the relevant tables, as described in the permissions and security section above. + +#### Option 2: Credentials vending + +Elementary can access the storage using temporary credentials issued by Databricks through [credential vending](https://docs.databricks.com/aws/en/external-access/credential-vending). +In the Elementary UI, choose **Credentials vending** under **Storage access method**. + +This requires granting `EXTERNAL USE SCHEMA` on the relevant schemas. + +When using this option, Elementary does not read the table data itself. It only reads the Delta transaction log, which contains metadata about the transactions. + +#### Option 3: Direct storage access + +Elementary can access the storage directly using credentials that you configure. +In the Elementary UI, choose **Direct storage access** under **Storage access method**. + +When using this option, Elementary does not read the table data itself. It only reads the Delta transaction log, which contains metadata about the transactions. + +For S3-backed Databricks storage, you can configure access in one of the following ways: + +__AWS Role authentication__ + +Databricks direct storage access using AWS role ARN + +This is the recommended approach, as it provides better security and follows AWS best practices. +After choosing **Direct storage access**, select **AWS role ARN** under **Select S3 authentication method**. + +1. Create an IAM role that Elementary can assume. +2. Select "Another AWS account" as the trusted entity. +3. Enter Elementary's AWS account ID: `743289191656`. +4. Optionally enable an external ID. +5. Attach a policy that grants read access to the Delta log files. + +Use a policy similar to the following: + +```json +{ + "Version": "2012-10-17", + "Statement": [ + { + "Sid": "VisualEditor0", + "Effect": "Allow", + "Action": [ + "s3:GetObject", + "s3:ListBucket" + ], + "Resource": [ + "arn:aws:s3:::databricks-metastore-bucket", + "arn:aws:s3:::databricks-metastore-bucket/*_delta_log*" + ] + } + ] +} +``` + +This policy is scoped to the bucket itself and objects matching `*_delta_log*`, so it does not grant access to other objects in the bucket. + +Provide the role ARN in the Elementary UI, and the external ID as well if you configured one. + +__AWS access keys__ + +Databricks direct storage access using AWS access keys + +If needed, you can instead provide direct AWS credentials. +After choosing **Direct storage access**, select **Secret access key** under **Select S3 authentication method**. + +1. Create an IAM user that Elementary will use for storage access. +2. Enable programmatic access. +3. Attach the same read-only S3 policy shown above. +4. Provide the AWS access key ID and secret access key in the Elementary UI. + +#### Access token (legacy) + +Authenticate with Access Token + +- **Access token**: A personal access token generated for the Elementary service principal. + + diff --git a/docs/snippets/cloud/integrations/dremio.mdx b/docs/snippets/cloud/integrations/dremio.mdx new file mode 100644 index 000000000..b9ce4ec12 --- /dev/null +++ b/docs/snippets/cloud/integrations/dremio.mdx @@ -0,0 +1,74 @@ +This guide contains the necessary steps to connect a Dremio environment to your Elementary account. + +**Note:** We currently support **Dremio Cloud only**. If you are using Dremio Software, please contact us for assistance. + +### Create a user for Elementary cloud + +Create an email account for the Elementary user. +Example: **elementary@your-organization.com** + +On your dbt project, run: + +```bash +## Print the query you should run to generate a user. +dbt run-operation create_elementary_user --args '{user: the_mail_of_the_elementary_user}' +``` + +This command will generate a query to create a user with the necessary permissions. Run this query on your data warehouse with admin permissions to create the user. It will send an email invitation, you need to accept that invitation. + +After the invitation was accepted, you need to sign in to Dremio as the Elementary user, and open the Account Settings (bottom-left corner in the Dremio UI): + + Dremio Account Settings + +Click **Generate Token**: + + +
+ Generate Token in Dremio +
+ + +Set the **maximum lifetime** allowed by Dremio (currently **180 days**). + +⚠️ Put a reminder to renew the token before it expires. + +You can update the token in Elementary's UI at any time. + +### Permissions and security + +Elementary cloud doesn't require read permissions to your tables and schemas, but only the following: + +- Read-only access to the elementary schema. +- Access to read metadata in information schema and query history, related to the tables in your dbt project. + +It is recommended to create a user using the instructions specified above to avoid granting excess privileges. For more details, refer to + +[**security and privacy**](https://docs.elementary-data.com/cloud/general/security-and-privacy) + +### Fill the connection form + +Provide the following fields: + +- **Host:** + - US: `api.dremio.cloud` + - EU: `api.eu.dremio.cloud` +- **Object Storage:** Name of the object storage where the Elementary schema is stored. +- **Object Storage Path:** Path inside the object storage where the Elementary schema is stored. +- **Project ID:** Your Dremio Cloud project ID. +- **User:** The email address of the Elementary user. +- **Token:** The token you generated for the Elementary user. + + +
+ Generate Token in Dremio +
+ \ No newline at end of file diff --git a/docs/_snippets/cloud/integrations/ip-allowlist.mdx b/docs/snippets/cloud/integrations/ip-allowlist.mdx similarity index 100% rename from docs/_snippets/cloud/integrations/ip-allowlist.mdx rename to docs/snippets/cloud/integrations/ip-allowlist.mdx diff --git a/docs/_snippets/cloud/integrations/onboarding-help.mdx b/docs/snippets/cloud/integrations/onboarding-help.mdx similarity index 100% rename from docs/_snippets/cloud/integrations/onboarding-help.mdx rename to docs/snippets/cloud/integrations/onboarding-help.mdx diff --git a/docs/_snippets/cloud/integrations/permissions-and-security.mdx b/docs/snippets/cloud/integrations/permissions-and-security.mdx similarity index 100% rename from docs/_snippets/cloud/integrations/permissions-and-security.mdx rename to docs/snippets/cloud/integrations/permissions-and-security.mdx diff --git a/docs/_snippets/cloud/integrations/postgres.mdx b/docs/snippets/cloud/integrations/postgres.mdx similarity index 55% rename from docs/_snippets/cloud/integrations/postgres.mdx rename to docs/snippets/cloud/integrations/postgres.mdx index 22b5a5d24..3d355cb01 100644 --- a/docs/_snippets/cloud/integrations/postgres.mdx +++ b/docs/snippets/cloud/integrations/postgres.mdx @@ -1,6 +1,9 @@ -You will connect Elementary Cloud to Postgres for syncing the Elementary schema (created by the [Elementary dbt package](/cloud/onboarding/quickstart-dbt-package)). +import CreateUserOperation from '/snippets/cloud/integrations/create-user-operation.mdx'; +import IpAllowlist from '/snippets/cloud/integrations/ip-allowlist.mdx'; - +This guide contains the necessary steps to connect a Postgres environment to your Elementary account. + + ### Fill the connection form @@ -13,4 +16,9 @@ Provide the following fields: - **User**: The name of the for Elementary user. - **Password**: The password associated with the provided user. - + + + +### Connect via SSH tunnel + +Elementary supports connecting via SSH or reverse SSH tunnel. Reach out to our team for details and support in this deployment. \ No newline at end of file diff --git a/docs/_snippets/cloud/integrations/redshift.mdx b/docs/snippets/cloud/integrations/redshift.mdx similarity index 55% rename from docs/_snippets/cloud/integrations/redshift.mdx rename to docs/snippets/cloud/integrations/redshift.mdx index b044ec531..f9570fdd1 100644 --- a/docs/_snippets/cloud/integrations/redshift.mdx +++ b/docs/snippets/cloud/integrations/redshift.mdx @@ -1,6 +1,9 @@ -You will connect Elementary Cloud to Redshift for syncing the Elementary schema (created by the [Elementary dbt package](/cloud/onboarding/quickstart-dbt-package)). +import CreateUserOperation from '/snippets/cloud/integrations/create-user-operation.mdx'; +import IpAllowlist from '/snippets/cloud/integrations/ip-allowlist.mdx'; - +This guide contains the necessary steps to connect a Redshift environment to your Elementary account. + + ### Fill the connection form @@ -13,4 +16,9 @@ Provide the following fields: - **User**: The name of the for Elementary user. - **Password**: The password associated with the provided user. - + + + +### Connect via SSH tunnel + +Elementary supports connecting via SSH or reverse SSH tunnel. Reach out to our team for details and support in this deployment. \ No newline at end of file diff --git a/docs/snippets/cloud/integrations/repo-connection-settings.mdx b/docs/snippets/cloud/integrations/repo-connection-settings.mdx new file mode 100644 index 000000000..b017d0b3e --- /dev/null +++ b/docs/snippets/cloud/integrations/repo-connection-settings.mdx @@ -0,0 +1,12 @@ +After the authentication, you need to fill in the following details: +- **Repository** - The full name of the code repo. +- _Optional_ **Environment base branch** - If you want Elementary to open PRs in a target branch different than default, detail the branch name here. +- _Optional_ **Project path** - If your dbt project isn't on the root directory of the repo, detail it's path here. +- _Optional_ **Update token** - When the github token expires, regenerate a fine-grained token and paste it here. + + + Repository connection settings + \ No newline at end of file diff --git a/docs/_snippets/cloud/integrations/snowflake.mdx b/docs/snippets/cloud/integrations/snowflake.mdx similarity index 63% rename from docs/_snippets/cloud/integrations/snowflake.mdx rename to docs/snippets/cloud/integrations/snowflake.mdx index 6a2ba656b..e1cdb930a 100644 --- a/docs/_snippets/cloud/integrations/snowflake.mdx +++ b/docs/snippets/cloud/integrations/snowflake.mdx @@ -1,6 +1,8 @@ -You will connect Elementary Cloud to Snowflake for syncing the Elementary schema (created by the [Elementary dbt package](/cloud/onboarding/quickstart-dbt-package)). +import CreateUserOperationSnowflake from '/snippets/cloud/integrations/create-user-operation-snowflake.mdx'; - +This guide contains the necessary steps to connect a Snowflake environment to your Elementary account. + + ### Fill the connection form @@ -12,15 +14,23 @@ Provide the following fields: - **Elementary schema**: The name of your Elementary schema. Usually `[schema name]_elementary`. - **Role (optional)**: e.g. `ELEMENTARY_ROLE`. -Elementary Cloud supports the user password and key pair authentication connection methods. +Elementary Cloud supports the following authentication methods: -- **User password**: - - User: The user created for Elementary. - - Password: The password you set when creating your Snowflake account. -- **Key pair**: +- **Key pair** (Recommended): - User: The user created for Elementary. - Private key: The private key you generated for Elementary. For more information, see [Generate Private Key](https://docs.snowflake.com/en/user-guide/key-pair-auth#configuring-key-pair-authentication) in the Snowflake docs. - Private key passphrase (optional) +- **User password** (Deprecated, not recommended): + - User: The user created for Elementary. + - Password: The password you set when creating your Snowflake account. + + + Snowflake are in the process of deprecating single-factor username & password authentication for all human users. As a result, while still supported, + we recommend configuring the user in advance using key-pair authentication rather than username & password, and configuring the user as TYPE=SERVICE + (this is automatically handled in the user creation macro above). + + See [here](https://docs.snowflake.com/en/user-guide/security-mfa-rollout) for more information regarding this change. + ### Add the Elementary IP to allowlist diff --git a/docs/_snippets/cloud/integrations/trino.mdx b/docs/snippets/cloud/integrations/trino.mdx similarity index 100% rename from docs/_snippets/cloud/integrations/trino.mdx rename to docs/snippets/cloud/integrations/trino.mdx diff --git a/docs/_snippets/cloud/introduction-opening.mdx b/docs/snippets/cloud/introduction-opening.mdx similarity index 100% rename from docs/_snippets/cloud/introduction-opening.mdx rename to docs/snippets/cloud/introduction-opening.mdx diff --git a/docs/_snippets/cloud/introduction.mdx b/docs/snippets/cloud/introduction.mdx similarity index 93% rename from docs/_snippets/cloud/introduction.mdx rename to docs/snippets/cloud/introduction.mdx index dc6c95d3d..ebc05faaf 100644 --- a/docs/_snippets/cloud/introduction.mdx +++ b/docs/snippets/cloud/introduction.mdx @@ -1,3 +1,5 @@ +import CloudIntegrationsCards from '/snippets/cloud/integrations/cards-groups/cloud-integrations-cards.mdx'; + ### Why choose Elementary Cloud? @@ -40,7 +42,7 @@ ### Elementary Cloud integrations - + ### How does it work? diff --git a/docs/_snippets/cloud/quick-start-cards.mdx b/docs/snippets/cloud/quick-start-cards.mdx similarity index 100% rename from docs/_snippets/cloud/quick-start-cards.mdx rename to docs/snippets/cloud/quick-start-cards.mdx diff --git a/docs/_snippets/cloud/quickstart-steps.mdx b/docs/snippets/cloud/quickstart-steps.mdx similarity index 54% rename from docs/_snippets/cloud/quickstart-steps.mdx rename to docs/snippets/cloud/quickstart-steps.mdx index 6af9536e0..21c7abd75 100644 --- a/docs/_snippets/cloud/quickstart-steps.mdx +++ b/docs/snippets/cloud/quickstart-steps.mdx @@ -1,6 +1,6 @@ - [Signup to Elementary](https://elementary-data.frontegg.com/oauth/account/sign-up) using Google SSO or email. + [Sign up to Elementary](https://elementary-data.frontegg.com/oauth/account/sign-up) using Google SSO or email. To start using Elementary, you need to [add our dbt package to your dbt project](/cloud/onboarding/quickstart-dbt-package). @@ -9,7 +9,10 @@ [Create your first environment](/cloud/onboarding/connect-data-warehouse) and sync the Elementary schema. Note: Elementary can't access your data. It only requires access to logs, test results, and metadata. - + +Connect your [code repository](/cloud/integrations/code-repo/connect-code-repo), [messaging tools](/cloud/guides/enable-slack-alerts), [BI platforms](/cloud/integrations/bi/connect-bi-tool), and [external catalogs](/cloud/integrations/governance/atlan) to get full visibility and collaboration across your data workflows. + + You're done with initial onboarding. Now you can [invite team members](/cloud/manage-team) to join! diff --git a/docs/_snippets/column-metrics.mdx b/docs/snippets/column-metrics.mdx similarity index 100% rename from docs/_snippets/column-metrics.mdx rename to docs/snippets/column-metrics.mdx diff --git a/docs/_snippets/data-tests/tests-cards.mdx b/docs/snippets/data-tests/tests-cards.mdx similarity index 100% rename from docs/_snippets/data-tests/tests-cards.mdx rename to docs/snippets/data-tests/tests-cards.mdx diff --git a/docs/_snippets/dbt-18-materializations-common.mdx b/docs/snippets/dbt-18-materializations-common.mdx similarity index 100% rename from docs/_snippets/dbt-18-materializations-common.mdx rename to docs/snippets/dbt-18-materializations-common.mdx diff --git a/docs/_snippets/dwh/bigquery/cli_permissions.mdx b/docs/snippets/dwh/bigquery/cli_permissions.mdx similarity index 100% rename from docs/_snippets/dwh/bigquery/cli_permissions.mdx rename to docs/snippets/dwh/bigquery/cli_permissions.mdx diff --git a/docs/snippets/dwh/bigquery/cli_service_account.mdx b/docs/snippets/dwh/bigquery/cli_service_account.mdx new file mode 100644 index 000000000..8b634e0d4 --- /dev/null +++ b/docs/snippets/dwh/bigquery/cli_service_account.mdx @@ -0,0 +1,7 @@ +import CreateServiceAccount from '/snippets/dwh/bigquery/create_service_account.mdx'; +import CliPermissions from '/snippets/dwh/bigquery/cli_permissions.mdx'; +import CreateKey from '/snippets/dwh/bigquery/create_key.mdx'; + + + + diff --git a/docs/_snippets/dwh/bigquery/cloud_permissions.mdx b/docs/snippets/dwh/bigquery/cloud_permissions.mdx similarity index 100% rename from docs/_snippets/dwh/bigquery/cloud_permissions.mdx rename to docs/snippets/dwh/bigquery/cloud_permissions.mdx diff --git a/docs/snippets/dwh/bigquery/cloud_service_account.mdx b/docs/snippets/dwh/bigquery/cloud_service_account.mdx new file mode 100644 index 000000000..23ac51d6d --- /dev/null +++ b/docs/snippets/dwh/bigquery/cloud_service_account.mdx @@ -0,0 +1,9 @@ +import CreateServiceAccount from '/snippets/dwh/bigquery/create_service_account.mdx'; +import CloudPermissions from '/snippets/dwh/bigquery/cloud_permissions.mdx'; +import CreateKey from '/snippets/dwh/bigquery/create_key.mdx'; +import GrantUserAccessOnDatasetLevel from '/snippets/dwh/bigquery/grant_user_access_on_dataset_level.mdx'; + + + + + diff --git a/docs/_snippets/dwh/bigquery/create_key.mdx b/docs/snippets/dwh/bigquery/create_key.mdx similarity index 100% rename from docs/_snippets/dwh/bigquery/create_key.mdx rename to docs/snippets/dwh/bigquery/create_key.mdx diff --git a/docs/_snippets/dwh/bigquery/create_service_account.mdx b/docs/snippets/dwh/bigquery/create_service_account.mdx similarity index 100% rename from docs/_snippets/dwh/bigquery/create_service_account.mdx rename to docs/snippets/dwh/bigquery/create_service_account.mdx diff --git a/docs/_snippets/dwh/bigquery/grant_user_access_on_dataset_level.mdx b/docs/snippets/dwh/bigquery/grant_user_access_on_dataset_level.mdx similarity index 100% rename from docs/_snippets/dwh/bigquery/grant_user_access_on_dataset_level.mdx rename to docs/snippets/dwh/bigquery/grant_user_access_on_dataset_level.mdx diff --git a/docs/snippets/dwh/databricks/create_service_principal.mdx b/docs/snippets/dwh/databricks/create_service_principal.mdx new file mode 100644 index 000000000..c0742d6eb --- /dev/null +++ b/docs/snippets/dwh/databricks/create_service_principal.mdx @@ -0,0 +1,69 @@ +### Create service principal + +1. Open your Databricks console, and then open your relevant workspace. + +2. Click on your Profile icon on the right and choose Settings. + +Choose settings + +3. On the sidebar, click on *Identity and access*, and then under the *Service Principals* row click on *Manage*. + +Choose settings + +4. Click on the *Add service principal* button, choose "Add new" and give a name to the service principal. This will be used by Elementary Cloud +to access your Databricks instance. + +Add service principal + +5. Click on your newly created service principal, add the "Databricks SQL access" entitlement, and click Update. Also, please copy the +"Application ID" field as it will be used later in the permissions section. + +Add databricks SQL access + +6. Next, generate credentials for your service principal. Choose one of the following methods: + + **Option A: Generate an OAuth secret (Recommended)** + + On the service principal page, go to the *Secrets* tab and click *Generate secret*. Copy the **Client ID** (this is the same as the "Application ID" from step 5) and the generated **Client secret** — you will need both when configuring the Elementary environment. + + {/* TODO: Add screenshot of Databricks service principal Secrets tab with "Generate secret" button */} + Generate OAuth M2M Secret (1) + Generate OAuth M2M Secret (2) + + + OAuth secrets are the recommended authentication method. They enable short-lived token generation + with automatic refresh, providing better security than long-lived personal access tokens. + + + **Option B: Create a personal access token (legacy)** + + In order to generate a personal access token for your service principal, you may first need to allow Token Usage for it. + To do so, go to the settings menu and choose Advanced -> Personal Access Tokens -> Permission Settings, then make sure the service principal is in the list. + + Allow token usage for service principal + + Then, create a personal access token for your service principal. For more details, please click [here](https://docs.databricks.com/aws/en/dev-tools/auth/pat#databricks-personal-access-tokens-for-service-principals). + +7. Finally, in order to enable Elementary's automated monitors feature, please ensure [predictive optimization](https://docs.databricks.com/aws/en/optimizations/predictive-optimization#enable-or-disable-predictive-optimization-for-your-account) is enabled in your account. +This is required for table statistics to be updated (Elementary relies on this to obtain up-to-date row counts) diff --git a/docs/snippets/dwh/databricks/databricks_permissions_and_security.mdx b/docs/snippets/dwh/databricks/databricks_permissions_and_security.mdx new file mode 100644 index 000000000..32ef78686 --- /dev/null +++ b/docs/snippets/dwh/databricks/databricks_permissions_and_security.mdx @@ -0,0 +1,37 @@ +### Permissions and security + +#### Required permissions + +Elementary cloud requires the following permissions: + +- **Elementary schema read-only access** - This is required by Elementary to read dbt metadata & test results collected by the Elementary dbt package as a part of your pipeline runs. +This permission does not give access to your data. + +- **Information schema metadata access** - Elementary needs access to the `system.information_schema.tables` and `system.information_schema.columns` system tables, to get metadata +about existing tables and columns in your data warehouse. This is used to power features such as column-level lineage and automated volume & freshness monitors. + +- **Read access needed for some metadata operations (optional)** - In order to enable Elementary's automated volume & freshness monitors, Elementary needs access to query history, as well +as Databricks APIs to obtain table statistics. +These operations require granting SELECT access on your tables. This is a Databricks limitation - Elementary **never** reads any data from your tables, only metadata. However, there isn't +today any table-level metadata-only permission available in Databricks, so SELECT is required. + + +#### Grants SQL template + +Please use the following SQL statements to grant the permissions specified above (you should replace the placeholders with the correct values): + +```sql +-- Grant read access on the elementary schema (usually [your dbt target schema]_elementary) +GRANT USE CATALOG ON CATALOG TO ``; +GRANT USE SCHEMA, SELECT ON SCHEMA TO ``; + +-- Grant access to information schema tables +GRANT USE CATALOG ON CATALOG system TO ``; +GRANT USE SCHEMA ON SCHEMA system.information_schema TO ``; +GRANT SELECT ON TABLE system.information_schema.tables TO ``; +GRANT SELECT ON TABLE system.information_schema.columns TO ``; + +-- Grant select on tables for history & statistics access +-- (Optional, required for automated volume & freshness tests - see explanation above. You can also limit to specific schemas used by dbt instead of granting on the full catalog) +GRANT USE CATALOG, USE SCHEMA, SELECT ON catalog to ``; +``` diff --git a/docs/_snippets/faq/question-change-elementary-schema.mdx b/docs/snippets/faq/question-change-elementary-schema.mdx similarity index 100% rename from docs/_snippets/faq/question-change-elementary-schema.mdx rename to docs/snippets/faq/question-change-elementary-schema.mdx diff --git a/docs/_snippets/faq/question-connection-profile.mdx b/docs/snippets/faq/question-connection-profile.mdx similarity index 100% rename from docs/_snippets/faq/question-connection-profile.mdx rename to docs/snippets/faq/question-connection-profile.mdx diff --git a/docs/_snippets/faq/question-cost.mdx b/docs/snippets/faq/question-cost.mdx similarity index 100% rename from docs/_snippets/faq/question-cost.mdx rename to docs/snippets/faq/question-cost.mdx diff --git a/docs/_snippets/faq/question-dbt-cloud.mdx b/docs/snippets/faq/question-dbt-cloud.mdx similarity index 100% rename from docs/_snippets/faq/question-dbt-cloud.mdx rename to docs/snippets/faq/question-dbt-cloud.mdx diff --git a/docs/_snippets/faq/question-disable-elementary-models.mdx b/docs/snippets/faq/question-disable-elementary-models.mdx similarity index 65% rename from docs/_snippets/faq/question-disable-elementary-models.mdx rename to docs/snippets/faq/question-disable-elementary-models.mdx index 407e3ee2d..22bfe4276 100644 --- a/docs/_snippets/faq/question-disable-elementary-models.mdx +++ b/docs/snippets/faq/question-disable-elementary-models.mdx @@ -1,6 +1,6 @@ -Elementary only needs you to run the models once after you install, and on upgrades of minor versions (like 0.7.X -> 0.8.X). +Elementary only needs you to run the models once after you install, and on upgrades of minor versions (like 0.15.X -> 0.16.X). On such upgrades we make schema changes, so we need you to rebuild the tables. For excluding the elementary models from your runs we suggest 2 options: @@ -20,10 +20,14 @@ models: +enabled: "{{ var('enable_elementary_models', false) }}" ``` -- When you upgrade elementary run: +You will run the Elementary models explicitly in one of two cases: +1. When you upgrade elementary run +2. If you choose to disable Elementary models from your runs and want to update them at your own time. + +To run Elementary models: ```shell -dbt run --select elementary --vars {enable_elementary_models: true} +dbt run --select elementary --vars '{"enable_elementary_models": true}' ``` diff --git a/docs/_snippets/faq/question-disable-hooks.mdx b/docs/snippets/faq/question-disable-hooks.mdx similarity index 100% rename from docs/_snippets/faq/question-disable-hooks.mdx rename to docs/snippets/faq/question-disable-hooks.mdx diff --git a/docs/_snippets/faq/question-elementary-permissions.mdx b/docs/snippets/faq/question-elementary-permissions.mdx similarity index 100% rename from docs/_snippets/faq/question-elementary-permissions.mdx rename to docs/snippets/faq/question-elementary-permissions.mdx diff --git a/docs/_snippets/faq/question-filter-elementary-tests.mdx b/docs/snippets/faq/question-filter-elementary-tests.mdx similarity index 100% rename from docs/_snippets/faq/question-filter-elementary-tests.mdx rename to docs/snippets/faq/question-filter-elementary-tests.mdx diff --git a/docs/_snippets/faq/question-full-refresh.mdx b/docs/snippets/faq/question-full-refresh.mdx similarity index 100% rename from docs/_snippets/faq/question-full-refresh.mdx rename to docs/snippets/faq/question-full-refresh.mdx diff --git a/docs/_snippets/faq/question-profile-permissions.mdx b/docs/snippets/faq/question-profile-permissions.mdx similarity index 100% rename from docs/_snippets/faq/question-profile-permissions.mdx rename to docs/snippets/faq/question-profile-permissions.mdx diff --git a/docs/_snippets/faq/question-schema-no-accordion.mdx b/docs/snippets/faq/question-schema-no-accordion.mdx similarity index 100% rename from docs/_snippets/faq/question-schema-no-accordion.mdx rename to docs/snippets/faq/question-schema-no-accordion.mdx diff --git a/docs/snippets/faq/question-schema.mdx b/docs/snippets/faq/question-schema.mdx new file mode 100644 index 000000000..8e4533c8c --- /dev/null +++ b/docs/snippets/faq/question-schema.mdx @@ -0,0 +1,7 @@ +import QuestionSchemaNoAccordion from '/snippets/faq/question-schema-no-accordion.mdx'; + + + + + + diff --git a/docs/_snippets/faq/question-singular-tests-config.mdx b/docs/snippets/faq/question-singular-tests-config.mdx similarity index 100% rename from docs/_snippets/faq/question-singular-tests-config.mdx rename to docs/snippets/faq/question-singular-tests-config.mdx diff --git a/docs/snippets/faq/question-test-results-sample.mdx b/docs/snippets/faq/question-test-results-sample.mdx new file mode 100644 index 000000000..c80ce6184 --- /dev/null +++ b/docs/snippets/faq/question-test-results-sample.mdx @@ -0,0 +1,28 @@ + + +Yes! Elementary saves samples of failed test rows and stores them in the table `test_result_rows`, then displays them in the *Results* tab of the report. + +By default, Elementary saves **5 rows per test**, but you have several options to control this: + +- **Change the sample size** globally or per-test using `test_sample_row_count` +- **Disable samples** for specific tests using `disable_test_samples` in the test meta +- **Protect PII** by automatically disabling samples for tables tagged with sensitive data tags +- **Elementary Cloud users** can also request environment-level controls from the Elementary team + +For example, to save 10 rows per test, add the following to your `dbt_project.yml` file: + +```yaml +vars: + test_sample_row_count: 10 +``` + +To disable samples entirely, set it to `0`: + +```yaml +vars: + test_sample_row_count: 0 +``` + +For the full list of controls including per-test overrides, PII protection, and Cloud options, see [Test Result Samples](/data-tests/test-result-samples). + + diff --git a/docs/_snippets/faq/question-tests-configuration-priorities.mdx b/docs/snippets/faq/question-tests-configuration-priorities.mdx similarity index 100% rename from docs/_snippets/faq/question-tests-configuration-priorities.mdx rename to docs/snippets/faq/question-tests-configuration-priorities.mdx diff --git a/docs/_snippets/faq/question-which-tests.mdx b/docs/snippets/faq/question-which-tests.mdx similarity index 100% rename from docs/_snippets/faq/question-which-tests.mdx rename to docs/snippets/faq/question-which-tests.mdx diff --git a/docs/_snippets/guides/alerts-code-configuration.mdx b/docs/snippets/guides/alerts-code-configuration.mdx similarity index 58% rename from docs/_snippets/guides/alerts-code-configuration.mdx rename to docs/snippets/guides/alerts-code-configuration.mdx index db9d3721e..339e73634 100644 --- a/docs/_snippets/guides/alerts-code-configuration.mdx +++ b/docs/snippets/guides/alerts-code-configuration.mdx @@ -1,14 +1,19 @@ +import Owner from '/snippets/alerts/owner.mdx'; +import Subscribers from '/snippets/alerts/subscribers.mdx'; +import Description from '/snippets/alerts/description.mdx'; +import Tags from '/snippets/alerts/tags.mdx'; + You can enrich your alerts by adding properties to tests, models and sources in your `.yml` files. -The supported attributes are: [owner](./alerts-configuration/#owner), -[subscribers](./alerts-configuration#subscribers), -[description](./alerts-configuration#test-description), -[tags](./alerts-configuration#tags). +The supported attributes are: [owner](#Owner), +[subscribers](#Subscribers), +[description](#Test-description), +[tags](#Tags). You can configure and customize your alerts by configuring: -[custom channel](./alerts-configuration#custom-channel), -[suppression interval](./alerts-configuration#suppression-interval), -[alert fields](./alerts-configuration#alert-fields)(for test alerts only), [alert grouping](./alerts-configuration#group-alerts-by-table), -[alert filters](./alerts-configuration#filter-alerts). +[custom channel](#Custom-channel), +[suppression interval](#Suppression-interval), +[alert fields](#Alert-fields)(for test alerts only), [alert grouping](#Group-alerts-by-table), +[alert filters](#Filter-alerts). ## Alert properties in `.yml` files @@ -45,174 +50,19 @@ Elementary prioritizes configuration in the following order: #### Owner -Elementary enriches alerts with [owners for models or tests](https://docs.getdbt.com/reference/resource-configs/meta#designate-a-model-owner)). - -- If you want the owner to be tagged on slack use '@' and the email prefix of the slack user (@jessica.jones to tag jessica.jones@marvel.com). -- You can configure a single owner or a list of owners (`["@jessica.jones", "@joe.joseph"]`). - - - -```yml model -models: - - name: my_model_name - meta: - owner: "@jessica.jones" -``` - -```yml test -tests: - - not_null: - meta: - owner: ["@jessica.jones", "@joe.joseph"] -``` - -```yml test/model config block -{{ config( - tags=["Tag1","Tag2"] - meta={ - "description": "This is a description", - "owner": "@jessica.jones" - } -) }} -``` - -```yml dbt_project.yml -models: - path: - subfolder: - +meta: - owner: "@jessica.jones" - -tests: - path: - subfolder: - +meta: - owner: "@jessica.jones" -``` - - + #### Subscribers -If you want additional users besides the owner to be tagged on an alert, add them as subscribers. - -- If you want the subscriber to be tagged on slack use '@' and the email prefix of the slack user (@jessica.jones to tag jessica.jones@marvel.com). -- You can configure a single subscriber or a list (`["@jessica.jones", "@joe.joseph"]`). - - - -```yml model -models: - - name: my_model_name - meta: - subscribers: "@jessica.jones" -``` - -```yml test -tests: - - not_null: - meta: - subscribers: ["@jessica.jones", "@joe.joseph"] -``` - -```yml test/model config block -{{ config( - meta={ - "subscribers": "@jessica.jones" - } -) }} -``` - -```yml dbt_project.yml -models: - path: - subfolder: - +meta: - subscribers: "@jessica.jones" - -tests: - path: - subfolder: - +meta: - subscribers: "@jessica.jones" -``` - - + #### Test description -Elementary supports configuring description for tests that are included in alerts. -It's recommended to add an explanation of what does it mean if this test fails, so alert will include this context. - - - -```yml test -tests: - - not_null: - meta: - description: "This is the test description" -``` - -```yml test config block -{{ config( - tags=["Tag1","Tag2"] - meta={ - description: "This is the test description" - } -) }} -``` - -```yml dbt_project.yml -tests: - path: - subfolder: - +meta: - description: "This is the test description" -``` - - + #### Tags -You can use [tags](https://docs.getdbt.com/reference/resource-configs/tags) to provide context to your alerts. - -- You can tag a group or a channel in a slack alert by adding `#channel_name` as a tag. -- Tags are aggregated,so a test alert will include both the test and the parent model tags. - - - -```yml model -models: - - name: my_model_name - tags: ["#marketing", "#data_ops"] -``` - -```yml test -tests: - - not_null: - tags: ["#marketing", "#data_ops"] -``` - -```yml test/model config block -{{ config( - tags=["#marketing", "#data_ops"] - } -) }} -``` - -```yml dbt_project.yml -models: - path: - subfolder: - tags: ["#marketing", "#data_ops"] - -tests: - path: - subfolder: - tags: ["#marketing", "#data_ops"] -``` - - + ### Alerts distribution @@ -231,13 +81,15 @@ the CLI or the `config.yml` file. ```yml model models: - name: my_model_name - meta: - channel: data_ops + config: + meta: + channel: data_ops ``` ```yml test -tests: +data_tests: - not_null: + config: meta: channel: data_ops ``` @@ -257,7 +109,7 @@ models: +meta: channel: data_ops -tests: +data_tests: path: subfolder: +meta: @@ -281,13 +133,15 @@ Note: if you configure a suppression interval using this method, it will overrid ```yml model models: - name: my_model_name - meta: - alert_suppression_interval: 24 + config: + meta: + alert_suppression_interval: 24 ``` ```yml test -tests: +data_tests: - not_null: + config: meta: alert_suppression_interval: 12 ``` @@ -307,7 +161,7 @@ models: +meta: alert_suppression_interval: 24 -tests: +data_tests: path: subfolder: +meta: @@ -331,13 +185,15 @@ Due to their nature, grouped alerts will contain less information on each issue. ```yml model models: - name: my_model_name - meta: - slack_group_alerts_by: table + config: + meta: + slack_group_alerts_by: table ``` ```yml test -tests: +data_tests: - not_null: + config: meta: slack_group_alerts_by: table ``` @@ -357,7 +213,7 @@ models: +meta: slack_group_alerts_by: table -tests: +data_tests: path: subfolder: +meta: @@ -391,13 +247,15 @@ Supported alert fields: ```yml model models: - name: my_model_name - meta: - alert_fields: ["description", "owners", "tags", "subscribers"] + config: + meta: + alert_fields: ["description", "owners", "tags", "subscribers"] ``` ```yml test -tests: +data_tests: - not_null: + config: meta: alert_fields: ["description", "owners", "tags", "subscribers"] ``` @@ -417,7 +275,7 @@ models: +meta: alert_fields: ["description", "owners", "tags", "subscribers"] -tests: +data_tests: path: subfolder: +meta: @@ -426,25 +284,10 @@ tests: -## Alerts global configuration - -#### Enable/disable alerts - -You can choose to enable / disable alert types by adding a var to your `dbt_project.yml`. - Vars will be deprecated soon! For OSS users, we recommend filtering the alerts + Alert vars are deprecated! We recommend filtering the alerts using [CLI selectors](/oss/guides/alerts/alerts-configuration#alerts-cli-flags) instead. -Below are the available vars and their default config: - -```yml dbt_project.yml -vars: - disable_model_alerts: false - disable_test_alerts: false - disable_warn_alerts: false - disable_skipped_model_alerts: true - disable_skipped_test_alerts: true -``` diff --git a/docs/_snippets/guides/collect-job-data.mdx b/docs/snippets/guides/collect-job-data.mdx similarity index 97% rename from docs/_snippets/guides/collect-job-data.mdx rename to docs/snippets/guides/collect-job-data.mdx index b221413ea..1f58348ff 100644 --- a/docs/_snippets/guides/collect-job-data.mdx +++ b/docs/snippets/guides/collect-job-data.mdx @@ -34,7 +34,7 @@ Elementary also supports passing job metadata as dbt vars. If `env_var` and `var To pass job data to Elementary using `var`, use the `--vars` flag in your invocations: ```shell -dbt run --vars '{"orchestrator": "Airflow", "job_name": "dbt_marketing_night_load"}' +dbt run --vars '{"orchestrator": "airflow", "job_name": "dbt_marketing_night_load"}' ``` #### Variables supported format @@ -63,7 +63,7 @@ By default, Elementary will collect the dbt cloud jobs info. If you wish to override that, change your dbt cloud invocations to pass the orchestrator job info using `--vars`: ```shell -dbt run --vars '{"orchestrator": "Airflow", "job_name": "dbt_marketing_night_load"}' +dbt run --vars '{"orchestrator": "airflow", "job_name": "dbt_marketing_night_load"}' ``` ## Where can I see my job info? diff --git a/docs/_snippets/guides/dbt-source-freshness.mdx b/docs/snippets/guides/dbt-source-freshness.mdx similarity index 65% rename from docs/_snippets/guides/dbt-source-freshness.mdx rename to docs/snippets/guides/dbt-source-freshness.mdx index eafe5ac37..a237b7304 100644 --- a/docs/_snippets/guides/dbt-source-freshness.mdx +++ b/docs/snippets/guides/dbt-source-freshness.mdx @@ -1,10 +1,20 @@ -Unlike dbt and Elementary tests, the results of the command `dbt source-freshness` are not automatically collected. -You can collect the results using the Elementary CLI tool. +For users of dbt version 1.8 and above + +Add the following flag to your `dbt_project.yml` file: + +```yaml dbt_project.yml +flags: + source_freshness_run_project_hooks: True +``` + +This flag enables Elementary to automatically collect `source-freshness` results, just like any other test results. -If dbt source freshness results are collected, they will be presented in the UI, and in alerts upon failure. +For dbt version under 1.8 +In dbt versions lower than 1.8, the results of the command `dbt source-freshness` are not automatically collected. +You can collect the results using the Elementary CLI tool. -## Collect source freshness failures +If dbt source freshness results are collected, they will be presented in the UI, and in alerts upon failure. #### dbt core users diff --git a/docs/snippets/guides/performance-alerts.mdx b/docs/snippets/guides/performance-alerts.mdx new file mode 100644 index 000000000..4433daf21 --- /dev/null +++ b/docs/snippets/guides/performance-alerts.mdx @@ -0,0 +1,94 @@ +Monitoring the performance of your dbt models is crucial for maintaining an efficient data pipeline. Elementary provides capabilities to set up alerts for long-running queries, helping you identify performance bottlenecks and optimize your data pipeline. + +There are two main approaches to creating alerts for long-running model queries: + +1. **Static Threshold Alerts**: Define specific time thresholds that, when exceeded, trigger an alert +2. **Anomaly Detection Alerts**: Use Elementary's anomaly detection to identify unusual increases in query execution time + +## Static Threshold Alerts + +You can define tests that fail when model execution times exceed predefined thresholds. This approach is straightforward and ideal when you have clear performance requirements. + +### Implementation Steps + +1. Create a singular test SQL file in your dbt project (e.g., `tests/test_models_run_under_30m.sql`): + +```sql +{{ config( + tags=["model_performance"], + meta={ + "description": "This test will fail on any models running over 30 minutes within the last 24 hours" + } +) }} + +select name, package_name, status, generated_at, execution_time +from {{ ref('elementary', 'model_run_results') }} +where CAST(generated_at AS timestamp) >= TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -1 day) + AND execution_time >= 30 * 60 +order by execution_time desc +``` + +In this example: +- The test monitors model runs over the past 24 hours +- It fails if any model takes longer than 30 minutes to run (1800 seconds) +- The test is tagged with "model_performance" for easy identification +- Results are ordered by execution time in descending order + +When this test fails, Elementary will generate an alert based on your alert configurations. The test results will also be visible in the Elementary UI, showing the 5 worst-performing model runs. + +## Anomaly Detection Alerts + +Instead of using fixed thresholds, you can leverage Elementary's anomaly detection to identify unusual increases in execution time. This approach is more dynamic and can adapt to your evolving data pipeline. + +### Implementation Steps + +1. Define a source on the `model_run_results` view in your `schema.yml` file (or another YAML file): + +```yaml +sources: + - name: elementary_models + schema: "your_elementary_schema" # Replace with your Elementary schema name + tables: + - name: model_run_results + columns: + - name: execution_time + data_tests: + - elementary.column_anomalies: + config: + severity: warn + tags: ["model_performance"] + arguments: + column_anomalies: + - max + dimensions: ["package_name", "name"] + timestamp_column: generated_at + anomaly_direction: spike + ignore_small_changes: + spike_failure_percent_threshold: 10 +``` + +In this configuration: +- Elementary monitors the `execution_time` column for anomalies +- Dimensions are set to `package_name` and `name` to analyze each model individually +- The test only detects spikes in execution time (`anomaly_direction: spike`) +- Small changes under 10% are ignored (`spike_failure_percent_threshold: 10`) +- The severity is set to "warn" but can be adjusted as needed + +This test will detect when a model's execution time increases significantly compared to its historical performance, triggering an alert when the increase exceeds the normal baseline. + +## Choosing the Right Approach + +Both methods have their strengths: + +- **Static Threshold**: Simple to implement and understand. Ideal when you have clear performance requirements (e.g., "models must run in under 30 minutes"). + +- **Anomaly Detection**: More adaptive to your specific environment. Better at detecting relative changes in performance rather than absolute thresholds. Useful when normal execution times vary across different models. + +You can implement both approaches simultaneously for comprehensive performance monitoring. + +## Viewing Performance Alerts + +Performance alerts appear in your regular Elementary alert channels (Slack, email, etc.) based on your alert configuration. + +You can also view performance test results in the Elementary UI under the Tests tab, filtered by the "model_performance" tag that we added to both test types. + diff --git a/docs/snippets/guides/reduce-on-run-end-time.mdx b/docs/snippets/guides/reduce-on-run-end-time.mdx new file mode 100644 index 000000000..e238af64a --- /dev/null +++ b/docs/snippets/guides/reduce-on-run-end-time.mdx @@ -0,0 +1,95 @@ +Elementary's `on-run-end` hooks collect test results and dbt artifacts to provide comprehensive observability. However, for large projects or when running only a subset of models, these hooks can add significant time to your dbt runs. This guide explains how to control the timing of on-run-end hooks to make them more efficient and avoid unnecessary runs while maintaining the observability you need. + +## Overview + +The Elementary on-run-end hooks perform several operations that can impact run time: + +- **Metadata artifacts upload**: Uploads all project metadata (models, tests, sources, etc.) +- **Run results upload**: Uploads test results and model execution results +- **dbt invocation upload**: Uploads dbt invocation metadata + +For large projects, these operations can take several minutes. The following strategies help you control when these hooks run to optimize efficiency and reduce unnecessary uploads. + +## Control Metadata Artifacts Upload Timing + +The most effective way to reduce on-run-end time is to control when metadata artifacts are uploaded. Instead of uploading all project metadata (models, tests, etc.) on every run, you can upload artifacts on a schedule that matches when your project actually changes. + +### Configuration + +Add the following to your `dbt_project.yml`: + +```yaml dbt_project.yml +vars: + disable_dbt_artifacts_autoupload: true +``` + +### Upload Artifacts When Needed + +When you need to update metadata (e.g., after schema changes or adding new models), upload artifacts either manually or via a recurring job (e.g., daily): + +```bash +dbt run --select elementary.edr.dbt_artifacts +``` + + +Metadata artifacts only need to be uploaded when your project structure changes (new models, tests, sources, etc.). For most runs, you only need run results, which are much faster to upload. + + +## Control Other Hooks Timing + + +**Important**: Disabling hooks (other than metadata artifacts) means you will lose this data for those runs. Only disable hooks for runs you don't want to monitor (e.g., local development). + + +You can further control when different parts of the on-run-end logic run by setting additional variables. + +### Available Configuration Variables + +```yaml dbt_project.yml +vars: + disable_run_results: true + disable_tests_results: true + disable_dbt_invocation_autoupload: true +``` + +- **`disable_run_results`**: Controls when model execution results are uploaded +- **`disable_tests_results`**: Controls when test results are uploaded +- **`disable_dbt_invocation_autoupload`**: Controls when dbt invocation metadata is uploaded + + +Disabling these hooks means you will lose this data for those runs. Only disable hooks for runs you don't want to monitor (e.g., local development). This will reduce the amount of data processed and uploaded at the end of each run, but will impact the completeness of reports and monitoring for those runs. + + +## Limit Hooks to Production or Specific Targets + +If you only need full observability in production, configure Elementary to run hooks only for specific targets. This is especially useful when you want to: +- Speed up local development +- Reduce costs in non-production environments +- Focus monitoring on production workloads + +### Configuration by Target + +Configure hooks to run only for specific targets: + +```yaml dbt_project.yml +vars: + disable_dbt_artifacts_autoupload: "{{ target.name != 'prod' }}" + disable_run_results: "{{ target.name != 'prod' }}" + disable_tests_results: "{{ target.name != 'prod' }}" + disable_dbt_invocation_autoupload: "{{ target.name != 'prod' }}" +``` + +### Disable Entire Package for Non-Production + +Alternatively, you can disable the entire Elementary package for non-production targets: + +```yaml dbt_project.yml +models: + elementary: + +enabled: "{{ target.name == 'prod' }}" +``` + + +Disabling the entire package will prevent Elementary tests from running. We recommend using the hook disablement vars instead, which allows Elementary tests to run while skipping the artifact collection. + + diff --git a/docs/_snippets/have-question.mdx b/docs/snippets/have-question.mdx similarity index 100% rename from docs/_snippets/have-question.mdx rename to docs/snippets/have-question.mdx diff --git a/docs/_snippets/install-cli.mdx b/docs/snippets/install-cli.mdx similarity index 76% rename from docs/_snippets/install-cli.mdx rename to docs/snippets/install-cli.mdx index 9da9c2215..21ca6c256 100644 --- a/docs/_snippets/install-cli.mdx +++ b/docs/snippets/install-cli.mdx @@ -16,9 +16,13 @@ pip install 'elementary-data[databricks]' pip install 'elementary-data[athena]' pip install 'elementary-data[trino]' pip install 'elementary-data[clickhouse]' +pip install 'elementary-data[duckdb]' +pip install 'elementary-data[dremio]' +pip install 'elementary-data[spark]' +pip install 'elementary-data[vertica]' ## Postgres doesn't require this step ``` Run `edr --help` in order to ensure the installation was successful. -If you're receiving `command not found: edr` please check our [troubleshooting guide](/general/troubleshooting). +If you're receiving `command not found: edr` please check our [troubleshooting guide](/oss/general/troubleshooting). diff --git a/docs/_snippets/install-dbt-package.mdx b/docs/snippets/install-dbt-package.mdx similarity index 100% rename from docs/_snippets/install-dbt-package.mdx rename to docs/snippets/install-dbt-package.mdx diff --git a/docs/snippets/integrations/dbt-fusion.mdx b/docs/snippets/integrations/dbt-fusion.mdx new file mode 100644 index 000000000..dc40e2e46 --- /dev/null +++ b/docs/snippets/integrations/dbt-fusion.mdx @@ -0,0 +1,103 @@ +Note: dbt-fusion support in Elementary is still in beta, as is dbt-fusion itself. Please see below a [list of features that are not yet implemented](#missing-features-in-dbt-fusion). + +Elementary's dbt package integrates with dbt-fusion, starting with version 0.20. +Fusion is a complete rewrite of the dbt engine, and provides many benefits, including enhanced performance and static analysis. + +For more details about dbt-fusion capabilities please consult the [dbt-fusion docs](https://docs.getdbt.com/docs/fusion/about-fusion). + +## Upgrading to dbt-fusion + +As a part of the migration to dbt-fusion, it is required to remove deprecated syntax from various areas of the dbt project, and YAMLs in particular. +Specifically for tests the following are important: + +1. Test arguments must be encapsulated under an `arguments` field. +2. Configuration fields such as `meta`, `tags` or `severity` must be encapsulated under a `config` field. +3. Recommended (but not a must) - change the `tests` field to `data_tests` to conform with the current dbt guidelines. + +Here's an example of an Elementary anomaly test with the old and new syntax: + + + +```yml Old syntax +models: + - name: login_events + config: + elementary: + timestamp_column: "loaded_at" + tests: + - elementary.volume_anomalies: + where_expression: "event_type in ('event_1', 'event_2') and country_name != 'unwanted country'" + time_bucket: + period: day + count: 1 + tags: ["elementary"] + severity: warn + meta: + owner: "@jessica.jones" + + - name: users + tests: + - elementary.volume_anomalies: + tags: ["elementary"] +``` + +```yml New syntax +models: + - name: login_events + config: + elementary: + timestamp_column: "loaded_at" + data_tests: + - elementary.volume_anomalies: + arguments: + where_expression: "event_type in ('event_1', 'event_2') and country_name != 'unwanted country'" + time_bucket: + period: day + count: 1 + config: + tags: ["elementary"] + severity: warn + meta: + owner: "@jessica.jones" + + - name: users + data_tests: + - elementary.volume_anomalies: + config: + tags: ["elementary"] +``` + + + +To facilitate the process of making these changes within your project, dbt introduced a tool called [dbt-autofix](https://github.com/dbt-labs/dbt-autofix) that +can be used to automatically migrate your project to the new syntax. + +Before running this tool, **please upgrade** to the most recent dbt-core version - as some of the syntax changes are not supported on older versions. + +For full information on upgrading from dbt-core to dbt-fusion, please check [dbt-fusion's official upgrade guide](https://docs.getdbt.com/docs/dbt-versions/core-upgrade/upgrading-to-fusion). + +## Supported Elementary capabilities + +Most of the main capabilities of the Elementary dbt package are supported in Fusion, including: +1. Anomaly detection tests. +2. Schema tests. +3. Artifacts collection. + +However, the following capabilities are not supported right now for dbt Fusion: +1. Python tests. +2. JSON schema tests. +3. The missing dbt-fusion features listed below. + +## Missing features in dbt-fusion + +In addition to the above, there are some features that are currently missing from dbt-fusion, and therefore are not yet available in the Elementary package. + +For each one of the issues below, we've included a link to the dbt-fusion github repository - **please upvote these features if they are important to you!** + +1. [Tests with error status are not being reported](https://github.com/dbt-labs/dbt-fusion/issues/686) - If a test fails "normally" (e.g. not_null fails on rows with null values), Elementary will report it as expected with a "fail" status. +However, if there is a compilation error / any error that comes before the actual test query ("error" status in dbt-core), it will currently be missing. +2. [Source freshness results are not reported](https://github.com/dbt-labs/dbt-fusion/issues/720) +3. [Exposure artifacts are not reported](https://github.com/dbt-labs/dbt-fusion/issues/859) +4. [Group artifacts are not reported](https://github.com/dbt-labs/dbt-fusion/issues/25) +5. [Compiled code is missing from dbt artifact tables](https://github.com/dbt-labs/dbt-fusion/issues/723) +6. [Failed row count is missing from dbt artifact tables](https://github.com/dbt-labs/dbt-fusion/issues/724) \ No newline at end of file diff --git a/docs/snippets/kapa-support.jsx b/docs/snippets/kapa-support.jsx new file mode 100644 index 000000000..0d5a06ced --- /dev/null +++ b/docs/snippets/kapa-support.jsx @@ -0,0 +1,345 @@ +import { useState, useEffect, useRef } from 'react'; + +const STORAGE_KEY = 'elementary_kapa_email'; +const HUBSPOT_PORTAL_ID = '142608385'; +const HUBSPOT_FORM_ID = '4734860b-68fb-4f7f-aada-afb14e61afe7'; +const HUBSPOT_SUBMIT_URL = `https://api.hsforms.com/submissions/v3/integration/submit/${HUBSPOT_PORTAL_ID}/${HUBSPOT_FORM_ID}`; +const PRIMARY_COLOR = '#FF20B8'; +const PRIMARY_HOVER = '#E01A9F'; + +/** Consumer domains — HubSpot "block free emails" often does not apply to the Forms API; mirror policy in-app. */ +const BLOCKED_CONSUMER_EMAIL_DOMAINS = { + '163.com': true, + '126.com': true, + 'aol.com': true, + 'duck.com': true, + 'fastmail.com': true, + 'gmail.com': true, + 'googlemail.com': true, + 'gmx.com': true, + 'gmx.de': true, + 'gmx.net': true, + 'hey.com': true, + 'hotmail.com': true, + 'hotmail.co.uk': true, + 'icloud.com': true, + 'live.com': true, + 'mac.com': true, + 'mail.com': true, + 'me.com': true, + 'msn.com': true, + 'outlook.com': true, + 'pm.me': true, + 'proton.me': true, + 'protonmail.com': true, + 'qq.com': true, + 'skiff.com': true, + 'tuta.io': true, + 'tutanota.com': true, + 'tutanota.de': true, + 'yahoo.com': true, + 'yahoo.co.uk': true, + 'yahoo.de': true, + 'yahoo.fr': true, + 'yandex.com': true, + 'yandex.ru': true, +}; + +const FREE_EMAIL_NOT_ACCEPTED_MSG = 'Please use your work email.'; + +function emailDomain(email) { + const i = email.lastIndexOf('@'); + if (i < 0) return ''; + return email + .slice(i + 1) + .toLowerCase() + .trim(); +} + +function isBlockedConsumerEmailDomain(email) { + return !!BLOCKED_CONSUMER_EMAIL_DOMAINS[emailDomain(email)]; +} + +function getStoredEmail() { + if (typeof window === 'undefined') return null; + try { + return localStorage.getItem(STORAGE_KEY); + } catch { + return null; + } +} + +function storeEmail(email) { + try { + localStorage.setItem(STORAGE_KEY, email); + } catch (_) {} +} + +function openKapa(email) { + if (typeof window === 'undefined') return; + window.kapaSettings = { + user: { + email: email, + uniqueClientId: email, + }, + }; + if (window.Kapa && typeof window.Kapa.open === 'function') { + window.Kapa.open(); + } else if (typeof window.Kapa === 'function') { + window.Kapa('open'); + } +} + +function hubspotErrorMessage(body) { + const fallback = + 'We could not accept this email. Please try again, or use a work email if your company requires it.'; + if (!body || typeof body !== 'object') return fallback; + if (Array.isArray(body.errors) && body.errors.length > 0) { + const first = body.errors[0]; + if (first?.message) return first.message; + } + if (typeof body.message === 'string' && body.message) return body.message; + if (typeof body.inlineMessage === 'string' && body.inlineMessage) return body.inlineMessage; + return fallback; +} + +async function submitToHubSpot(email) { + if (typeof window === 'undefined') return { ok: false, message: 'Something went wrong. Please try again.' }; + try { + const res = await fetch(HUBSPOT_SUBMIT_URL, { + method: 'POST', + headers: { 'Content-Type': 'application/json' }, + body: JSON.stringify({ + fields: [{ name: 'email', value: email }], + context: { + pageUri: window.location.href, + pageName: 'Elementary Docs - Ask Elementary AI', + }, + }), + }); + const text = await res.text(); + let body = null; + try { + body = text ? JSON.parse(text) : null; + } catch { + body = null; + } + if (res.ok) { + if (body && Array.isArray(body.errors) && body.errors.length > 0) { + return { ok: false, message: hubspotErrorMessage(body) }; + } + return { ok: true }; + } + return { ok: false, message: hubspotErrorMessage(body) }; + } catch { + return { ok: false, message: 'Something went wrong. Check your connection and try again.' }; + } +} + +const KapaSupport = () => { + const [mounted, setMounted] = useState(false); + const [popoverOpen, setPopoverOpen] = useState(false); + const [email, setEmail] = useState(''); + const [error, setError] = useState(''); + const [submitting, setSubmitting] = useState(false); + const popoverRef = useRef(null); + const buttonRef = useRef(null); + + useEffect(() => { + setMounted(true); + }, []); + + useEffect(() => { + if (!mounted || !popoverOpen) return; + const handleClickOutside = (e) => { + if ( + popoverRef.current && + !popoverRef.current.contains(e.target) && + buttonRef.current && + !buttonRef.current.contains(e.target) + ) { + setPopoverOpen(false); + } + }; + document.addEventListener('mousedown', handleClickOutside); + return () => document.removeEventListener('mousedown', handleClickOutside); + }, [mounted, popoverOpen]); + + const handleButtonClick = () => { + const stored = getStoredEmail(); + if (stored && stored.trim()) { + const s = stored.trim(); + const emailRegex = /^[^\s@]+@[^\s@]+\.[^\s@]+$/; + if (!emailRegex.test(s)) { + try { + localStorage.removeItem(STORAGE_KEY); + } catch { + /* ignore */ + } + setError(''); + setPopoverOpen(true); + return; + } + if (isBlockedConsumerEmailDomain(s)) { + try { + localStorage.removeItem(STORAGE_KEY); + } catch { + /* ignore */ + } + setError(''); + setPopoverOpen(true); + return; + } + openKapa(s); + return; + } + setPopoverOpen((open) => !open); + setError(''); + }; + + const handleSubmit = async (e) => { + e.preventDefault(); + const trimmed = (email || '').trim(); + if (!trimmed) { + setError('Please enter your email.'); + return; + } + const emailRegex = /^[^\s@]+@[^\s@]+\.[^\s@]+$/; + if (!emailRegex.test(trimmed)) { + setError('Please enter a valid email address.'); + return; + } + if (isBlockedConsumerEmailDomain(trimmed)) { + setError(FREE_EMAIL_NOT_ACCEPTED_MSG); + return; + } + setError(''); + setSubmitting(true); + try { + const result = await submitToHubSpot(trimmed); + if (!result.ok) { + setError(result.message || 'Please try a different email.'); + return; + } + storeEmail(trimmed); + openKapa(trimmed); + setPopoverOpen(false); + setEmail(''); + } finally { + setSubmitting(false); + } + }; + + if (!mounted) return null; + + const buttonStyle = { + position: 'fixed', + bottom: 24, + right: 24, + zIndex: 2147483646, + display: 'flex', + alignItems: 'center', + gap: 8, + padding: '12px 20px', + backgroundColor: PRIMARY_COLOR, + color: '#ffffff', + border: 'none', + borderRadius: 50, + fontSize: 14, + fontWeight: 600, + cursor: 'pointer', + boxShadow: '0 4px 14px rgba(255, 32, 184, 0.4)', + fontFamily: 'system-ui, -apple-system, sans-serif', + }; + + const popoverStyle = { + position: 'fixed', + bottom: 80, + right: 24, + zIndex: 2147483647, + width: '100%', + maxWidth: 320, + backgroundColor: '#ffffff', + borderRadius: 12, + padding: 20, + boxShadow: '0 10px 40px rgba(0,0,0,0.15)', + fontFamily: 'system-ui, -apple-system, sans-serif', + }; + + const messageStyle = { fontSize: 14, color: '#374151', margin: '0 0 16px', lineHeight: 1.5 }; + const inputStyle = { + width: '100%', + boxSizing: 'border-box', + padding: '12px 14px', + fontSize: 14, + color: '#111827', + borderRadius: 8, + border: '1px solid #e5e7eb', + marginBottom: 12, + outline: 'none', + }; + const submitBtnStyle = { + width: '100%', + padding: '12px 16px', + fontSize: 14, + fontWeight: 600, + color: '#ffffff', + backgroundColor: PRIMARY_COLOR, + border: 'none', + borderRadius: 8, + cursor: submitting ? 'not-allowed' : 'pointer', + opacity: submitting ? 0.8 : 1, + }; + + return ( + <> + + + {popoverOpen && ( +
+

+ Ask any question about Elementary. +
+
+ Leave your email in case a follow up is needed: +

+
+ setEmail(e.target.value)} + disabled={submitting} + style={inputStyle} + autoFocus + /> + +
+ {error &&

{error}

} +
+ )} + + ); +}; + +export default KapaSupport; +export { KapaSupport }; diff --git a/docs/snippets/oss/adapters-cards.mdx b/docs/snippets/oss/adapters-cards.mdx new file mode 100644 index 000000000..7a1da151c --- /dev/null +++ b/docs/snippets/oss/adapters-cards.mdx @@ -0,0 +1,530 @@ + + + + + + + + + + + } + > + + + + + + + + + + + + + } + > + + + + + + + + + + } + > + + + + } + > + + + + + + + + + + + + } + > + + + + + + + + + + + + + + + + + + + + + + + } + > + + } + > + + + + + + + + + + + + + + + + + + + + + + + } + > + + + + + + } + > + + + + + + + + + + + + + + + + + + + + + } + > + + + + + + } + > + + + + + + + + + + + + + + + + + + + + + + + } + > + + + + } + > + + + + + + + + + + + } + > + + diff --git a/docs/_snippets/profiles/all-profiles.mdx b/docs/snippets/profiles/all-profiles.mdx similarity index 54% rename from docs/_snippets/profiles/all-profiles.mdx rename to docs/snippets/profiles/all-profiles.mdx index 830723b0d..1cdfef342 100644 --- a/docs/_snippets/profiles/all-profiles.mdx +++ b/docs/snippets/profiles/all-profiles.mdx @@ -11,17 +11,18 @@ elementary: default: type: snowflake account: [account id] - - ## User/password auth ## user: [username] - password: [password] - role: [user role] + + ## Keypair auth (recommended) ## + private_key_path: [path/to/private.key] + # or private_key instead of private_key_path + private_key_passphrase: [passphrase for the private key, if key is encrypted] + database: [database name] warehouse: [warehouse name] schema: [schema name]_elementary threads: 4 - ``` ```yml BigQuery @@ -137,6 +138,24 @@ elementary: threads: [number of threads like 8] ``` +```yml Clickhouse +## Clickhouse ## +## By default, edr expects the profile name 'elementary'. ## +## Configure the database and schema of elementary models. ## +## Check where 'elementary_test_results' is to find it. ## + +elementary: + outputs: + default: + type: clickhouse + schema: [schema name] # elementary schema, usually [schema name]_elementary + host: [hostname] + port: [port] + user: [username] + password: [password] + threads: [number of threads like 8] +``` + ```yml Trino ## Trino ## ## By default, edr expects the profile name 'elementary'. ## @@ -158,8 +177,8 @@ elementary: # session_properties: [optional, sets Trino session properties used in the connection] ``` -```yml ClickHouse -## CLICKHOUSE ## +```yml DuckDB +## DUCKDB ## ## By default, edr expects the profile name 'elementary'. ## ## Configure the database and schema of elementary models. ## ## Check where 'elementary_test_results' is to find it. ## @@ -167,14 +186,133 @@ elementary: elementary: outputs: default: - type: clickhouse + type: duckdb + path: [path to your .duckdb file] + schema: [schema name] # elementary schema, usually [schema name]_elementary + threads: [1 or more] +``` + +```yml Dremio +## DREMIO ## +## By default, edr expects the profile name 'elementary'. ## +## Configure the database and schema of elementary models. ## +## Check where 'elementary_test_results' is to find it. ## + +## -- Dremio Cloud -- ## +elementary: + outputs: + default: + type: dremio + cloud_host: api.dremio.cloud # or api.eu.dremio.cloud for EU + cloud_project_id: [project ID] + user: [email address] + pat: [personal access token] + use_ssl: true + object_storage_source: [name] # alias: datalake + object_storage_path: [path] # alias: root_path + dremio_space: [name] # alias: database + dremio_space_folder: [path] # alias: schema, usually [schema]_elementary + threads: [1 or more] + +## -- Dremio Software -- ## +elementary: + outputs: + default: + type: dremio + software_host: [hostname or IP address] + port: 9047 + user: [username] + password: [password] # or use pat: [personal access token] + use_ssl: [true or false] + object_storage_source: [name] # alias: datalake + object_storage_path: [path] # alias: root_path + dremio_space: [name] # alias: database + dremio_space_folder: [path] # alias: schema, usually [schema]_elementary + threads: [1 or more] +``` + +```yml Spark +## SPARK ## +## By default, edr expects the profile name 'elementary'. ## +## Configure the database and schema of elementary models. ## +## Check where 'elementary_test_results' is to find it. ## + +elementary: + outputs: + default: + type: spark + method: [thrift, http, or odbc] host: [hostname] port: [port] user: [username] + schema: [schema name] # elementary schema, usually [schema name]_elementary + threads: [1 or more] + # token: [optional, used with http method] +``` + +```yml Fabric +## FABRIC ## +## By default, edr expects the profile name 'elementary'. ## +## Configure the database and schema of elementary models. ## +## Check where 'elementary_test_results' is to find it. ## + +elementary: + outputs: + default: + type: fabric + driver: ODBC Driver 18 for SQL Server + server: [hostname] + port: 1433 + database: [database name] + schema: [schema name] # elementary schema, usually [schema name]_elementary + authentication: ActiveDirectoryServicePrincipal + tenant_id: [tenant_id] + client_id: [client_id] + client_secret: [client_secret] + threads: [1 or more] +``` + +```yml SQL Server +## SQL SERVER ## +## By default, edr expects the profile name 'elementary'. ## +## Configure the database and schema of elementary models. ## +## Check where 'elementary_test_results' is to find it. ## + +elementary: + outputs: + default: + type: sqlserver + driver: ODBC Driver 18 for SQL Server + server: [hostname] + port: 1433 + database: [database name] + schema: [schema name] # elementary schema, usually [schema name]_elementary + user: [username] password: [password] - secure: [True/False] - verify: [True/False] + threads: [1 or more] + # encrypt: true # default true in dbt-sqlserver >= 1.2.0 + # trust_cert: false +``` + +```yml Vertica +## VERTICA ## +## By default, edr expects the profile name 'elementary'. ## +## Configure the database and schema of elementary models. ## +## Check where 'elementary_test_results' is to find it. ## + +elementary: + outputs: + default: + type: vertica + host: [hostname] + port: 5433 + database: [database name] schema: [schema name] # elementary schema, usually [schema name]_elementary + username: [username] + password: [password] + threads: [1 or more] + # connection_load_balance: true + # backup_server_node: [comma separated list of backup hostnames or IPs] ``` diff --git a/docs/_snippets/profiles/bigquery-profile.mdx b/docs/snippets/profiles/bigquery-profile.mdx similarity index 100% rename from docs/_snippets/profiles/bigquery-profile.mdx rename to docs/snippets/profiles/bigquery-profile.mdx diff --git a/docs/_snippets/profiles/databricks-profile.mdx b/docs/snippets/profiles/databricks-profile.mdx similarity index 100% rename from docs/_snippets/profiles/databricks-profile.mdx rename to docs/snippets/profiles/databricks-profile.mdx diff --git a/docs/_snippets/profiles/redshift-profile.mdx b/docs/snippets/profiles/redshift-profile.mdx similarity index 100% rename from docs/_snippets/profiles/redshift-profile.mdx rename to docs/snippets/profiles/redshift-profile.mdx diff --git a/docs/_snippets/profiles/snowflake-profile.mdx b/docs/snippets/profiles/snowflake-profile.mdx similarity index 100% rename from docs/_snippets/profiles/snowflake-profile.mdx rename to docs/snippets/profiles/snowflake-profile.mdx diff --git a/docs/_snippets/quickstart-package-install.mdx b/docs/snippets/quickstart-package-install.mdx similarity index 78% rename from docs/_snippets/quickstart-package-install.mdx rename to docs/snippets/quickstart-package-install.mdx index 9c8d8b5e1..b07da4949 100644 --- a/docs/_snippets/quickstart-package-install.mdx +++ b/docs/snippets/quickstart-package-install.mdx @@ -1,5 +1,9 @@ To start using Elementary, you need to add our dbt package to your dbt project. + + **Note:** elementary dbt package has to be installed in all of the connected projects in your environment. + + A dbt package is additional Jinja and SQL code that is added to your project, for additional functionality. In fact, each package is a dbt project. By adding a package to your project, you are adding the package code to be part of your project, you can reference its macros, execute its models, and so on. @@ -8,7 +12,7 @@ Some packages we recommend you check out: [dbt_utils](https://github.com/dbt-lab -## Video: How to install Elementary dbt package? +## Step-by-step: How to install the Elementary dbt package?
@@ -29,7 +33,7 @@ Some packages we recommend you check out: [dbt_utils](https://github.com/dbt-lab
-## Step-by-step: Install Elementary dbt package +## Install Elementary dbt package @@ -75,19 +79,30 @@ Some packages we recommend you check out: [dbt_utils](https://github.com/dbt-lab **If you change materialization settings, make sure to run `dbt run -s elementary --full-refresh`.** - - + - In order for these features to work, it is required to add the following flag to your `dbt_project.yml` file: - ```yml dbt_project.yml - flags: - require_explicit_package_overrides_for_builtin_materializations: false + In order for these features to work, add the following file: elementary_materialization.sql to your macros folder so it will look like this: + `macros/elementary_materialization.sql` + + In the file copy and paste the following code: + + If you use Snowflake: + ``` + {% materialization test, adapter='snowflake' %} + {{ return(elementary.materialization_test_snowflake()) }} + {% endmaterialization %} + ``` + + If you use any other DWH: + ``` + {% materialization test, default %} + {{ return(elementary.materialization_test_default()) }} + {% endmaterialization %} ``` - Please note that after setting this flag you may see a deprecation warning from dbt. - This is a temporary measure and we are working with the dbt team on a [longer term solution](https://github.com/dbt-labs/dbt-core/issues/10090). - - + + + ```yml dbt_project.yml models: ## see docs: https://docs.elementary-data.com/ @@ -96,11 +111,8 @@ Some packages we recommend you check out: [dbt_utils](https://github.com/dbt-lab +schema: "elementary" ## To disable elementary for dev, uncomment this: # enabled: "{{ target.name in ['prod','analytics'] }}" - - # Required from dbt 1.8 and above for certain Elementary features (please see more details above) - flags: - require_explicit_package_overrides_for_builtin_materializations: false ``` + @@ -126,6 +138,8 @@ Some packages we recommend you check out: [dbt_utils](https://github.com/dbt-lab ## What happens now? -Once the elementary dbt package has been installed and configured, your test results, run results and [dbt artifacts](/dbt/dbt-artifacts) will be loaded to elementary schema tables. +Once the elementary dbt package has been installed and configured, your test results, run results and [dbt artifacts](data-tests/dbt/dbt-artifacts) will be loaded to elementary schema tables. If you see data in these models you completed the package deployment (Congrats! 🎉). + +Updating environment settings can take up to a few minutes. diff --git a/docs/_snippets/quickstart/quickstart-cards.mdx b/docs/snippets/quickstart/quickstart-cards.mdx similarity index 97% rename from docs/_snippets/quickstart/quickstart-cards.mdx rename to docs/snippets/quickstart/quickstart-cards.mdx index 871de0acd..bdcd6dace 100644 --- a/docs/_snippets/quickstart/quickstart-cards.mdx +++ b/docs/snippets/quickstart/quickstart-cards.mdx @@ -12,7 +12,6 @@
Integrations:
@@ -20,13 +19,14 @@ title="Elementary OSS Package" icon="square-terminal" iconType="solid" + href="/oss/oss-introduction" + >
For data and analytics engineers that require basic observability capabilities or for evaluating features without vendor approval. Our community can provide great support if needed.




Integrations: diff --git a/docs/snippets/quickstart/quickstart-elementary-prod.mdx b/docs/snippets/quickstart/quickstart-elementary-prod.mdx new file mode 100644 index 000000000..682357f85 --- /dev/null +++ b/docs/snippets/quickstart/quickstart-elementary-prod.mdx @@ -0,0 +1,82 @@ +import QuestionSchema from '/snippets/faq/question-schema.mdx'; + + + + +You can choose any system that allows you to orchestrate a CLI execution, as long as it can meet the following requirements: + +- Full installation of Elementary Python package and dependencies. +- Network access to the data warehouse. +- Access to a `profiles.yml` file with `elementary` profile name. +- Read and write permissions to the Elementary schema. + +## Elementary `--env` flag + +When you run Elementary in production use the `--env prod` flag (By default, it is set to `dev`). +This will show the environment as `prod` in the report and alerts. + +## Elementary production deployment and jobs + +Your deployment of Elementary has two parts: + +**Part 1 - Elementary package in your production dbt project** + +In your dbt jobs, after you deploy the Elementary dbt package: + +1. **Run and test results** - Collected by default as part of your runs. +2. **Elementary tests** - Make sure your dbt test runs include elementary tests. +3. **Update dbt artifacts** - Elementary uses the dbt artifacts models to enrich the report, alerts and lineage. To update the artifacts data, run the elementary dbt artifacts models. + +**Part 2 - Elementary CLI** + +On an orchestration system of your choice, run the CLI to: + +1. **Send Slack alerts** - Use the `edr monitor --env prod` command and Slack integration. +2. **Generate a report** - Use the `edr report --env prod` or `edr send-report --env prod` that has built in support for sending the report to Slack / GCS / S3. + +## What permissions does Elementary require? + +The CLI needs to have permissions to access the `profiles.yml` file with the relevant profile, +and network access to the data warehouse. + +Also, in the `elementary` profile, the credentials should have permissions to: + +- Read all project schemas +- Write to the elementary schema + +On your dbt project, make sure that Elementary dbt package can: + +- Read all project schemas +- Write to the elementary schema +- Create a schema (alternatively, you can create the elementary schema in advance) + + + +## When to run Elementary? + +For sending alerts or generating a report, there are two options: + +1. **Run Elementary CLI after each relevant dbt job** (`dbt test` / `dbt build` / `dbt run`). +2. **Run Elementary CLI periodically** in a frequency that fits your dbt job executions. + +## Ways to run Elementary in production + +If your organization is using dbt-core, it would probably be a good choice to orchestrate Elementary using the same system that orchestrates dbt. + +Any automation server / orchestration tool supports running a CLI tool like Elementary. + +Some options include: + +1. [Elementary cloud](/cloud/introduction) +2. GitHub Actions - Checkout the [Elementary GitHub Action](https://github.com/elementary-data/run-elementary-action) we created. +3. [Docker](/oss/deployment-and-configuration/docker) +4. [GitLab CI](/oss/deployment-and-configuration/gitlab-ci) +5. Airflow +6. Prefect +7. Meltano - Checkout the [custom Elementary integration](https://hub.meltano.com/utilities/elementary) developed by [Stéphane Burwash](https://github.com/SBurwash) - [Potloc](https://www.potloc.com/). +8. Using cron + +## Need help? + +If you want to consult on production deployment, we would love to help! +Please reach out to us on [Slack](https://elementary-data.com/community), we could talk there or schedule a deployment video call. \ No newline at end of file diff --git a/docs/_snippets/setup-slack-integration.mdx b/docs/snippets/setup-slack-integration.mdx similarity index 100% rename from docs/_snippets/setup-slack-integration.mdx rename to docs/snippets/setup-slack-integration.mdx diff --git a/docs/_snippets/setup-teams-integration.mdx b/docs/snippets/setup-teams-integration.mdx similarity index 95% rename from docs/_snippets/setup-teams-integration.mdx rename to docs/snippets/setup-teams-integration.mdx index 477a82ff8..dd9aa0a70 100644 --- a/docs/_snippets/setup-teams-integration.mdx +++ b/docs/snippets/setup-teams-integration.mdx @@ -36,11 +36,11 @@ Call it `Elementary` (or whatever you prefer) and connect it to the workspace of Now it's time to set up a webhook. You have two options for creating a webhook: - + ## Create a webhook using Connectors -**Note:** Microsoft 365 Connectors are being deprecated. Consider using Power Automate Workflows (Option 2) for new integrations. +**Note:** Microsoft 365 Connectors are set to be deprecated end of 2025. Consider using Power Automate Workflows (Option 2) for new integrations. Go to a channel in your Team and choose `Manage channel` @@ -63,6 +63,7 @@ Search for `Incoming webhook` and choose `Add`. alt="Teams add incoming webhook" /> + Choose `Add` again and add name your webhook `ElementaryWebhook` (or whatever you prefer). And `Create` the webhook. - + ## Create a webhook using Power Automate diff --git a/docs/_snippets/share-report.mdx b/docs/snippets/share-report.mdx similarity index 100% rename from docs/_snippets/share-report.mdx rename to docs/snippets/share-report.mdx diff --git a/docs/_snippets/support-contact.mdx b/docs/snippets/support-contact.mdx similarity index 100% rename from docs/_snippets/support-contact.mdx rename to docs/snippets/support-contact.mdx diff --git a/docs/styles.css b/docs/styles.css new file mode 100644 index 000000000..c3bdef397 --- /dev/null +++ b/docs/styles.css @@ -0,0 +1,1335 @@ +@import url('https://fonts.googleapis.com/css2?family=Poppins:ital,wght@0,400;0,500;0,600;0,700;1,400&display=swap'); + +/* ==================================== */ +/* DARK MODE LINKS & ORANGE OVERRIDES +/* ==================================== */ +.dark a[style*="text-decoration"], +.dark a[style*="underline"] { + text-decoration-color: #FF20B8 !important; +} + +.dark a[href] { + text-decoration-color: #FF20B8; +} + +/* Override any orange colors in dark mode */ +.dark * { + --mint-light: #FF20B8 !important; +} + +.dark [style*="#FE9356"], +.dark [style*="#fe9356"], +.dark [style*="rgb(254, 147, 86)"], +.dark [style*="rgb(254,147,86)"] { + color: #FF20B8 !important; + border-color: #FF20B8 !important; + background-color: rgba(255, 32, 184, 0.1) !important; +} + +.dark [style*="text-decoration-color"][style*="#FE9356"], +.dark [style*="text-decoration-color"][style*="#fe9356"] { + text-decoration-color: #FF20B8 !important; +} + +/* ==================================== */ +/* LAYOUT +/* ==================================== */ + +html { + scroll-padding-top: 150px; +} + +body{ + font-family: 'Poppins' !important; +} + +h1, h2, h3, h4, h5, h6 { + font-family: 'Poppins' !important; +} + +.home__container { + max-width: 1366px; + padding-inline: 4rem; + padding-block: 72px; + margin-inline: auto; + width: 100%; +} + +@media (max-width: 1024px) { + .home__container { + padding-inline: 3rem; + padding-block: 56px; + } +} + +@media (max-width: 768px) { + .home__container { + padding-inline: 2rem; + padding-block: 48px; + } +} + +@media (max-width: 480px) { + .home__container { + padding-inline: 1rem; + padding-block: 32px; + } +} + +.home__theme-gray { + background-color: #f5f2f3; + color: #321200; +} + +.dark .home__theme-gray { + background-color: #2e2e2e; + color: #e0e0e0; +} + +.home__theme-purple { + background-color: #000000; + color: white; +} + +.dark .home__theme-purple { + background-color: #EBE5E6; + color: #000000; +} + +.home__theme-purple-dark { + background-color: #091114; + color: #f0eff3; +} + +.home__theme-pink { + background-color: #FF20B8; + color: white; +} + +.home__heading-lg { + font-size: 36px; + line-height: 125%; + letter-spacing: -2%; + font-weight: 600; +} + +.home__heading-xl { + font-size: 48px; + line-height: 125%; + letter-spacing: -2%; + font-weight: 600; +} + +.home__paragraph-lg { + font-size: 18px; + line-height: 140%; +} + +.home__paragraph-xl { + font-size: 20px; + line-height: 140%; +} + +.home__2col { + display: grid; + grid-template-columns: 1fr 1fr; + gap: 40px 32px; + width: 100%; + align-items: start; +} + +.home__2col.is-uneven { + grid-template-columns: 1fr 2fr; +} + +.home__3col { + display: grid; + grid-template-columns: 1fr 1fr 1fr; + gap: 40px 32px; + width: 100%; + align-items: start; +} + +@media (max-width: 900px) { + .home__2col { + grid-template-columns: 1fr; + gap: 32px 0; + } + .home__2col.is-uneven { + grid-template-columns: 1fr; + } + .home__3col { + display: grid; + grid-template-columns: 1fr 1fr; + } +} + +@media (max-width: 520px) { + .home__3col { + display: grid; + grid-template-columns: 1fr; + } +} + +.padding-bottom-none { + padding-bottom: 0 !important; +} + +.padding-top-none { + padding-top: 0 !important; +} + + +/* ==================================== */ +/* BUTTONS +/* ==================================== */ + +.home__button { + display: inline-flex; + flex-direction: row; + align-items: center; + gap: 4px; + font-weight: 500; + font-size: 16px; + color: #fff; + background: #FF20B8; + border: none; + outline: none; + cursor: pointer; + text-decoration: none; + border-radius: 999px; + padding: 10px 20px; + transition: background 0.2s, color 0.2s, border-color 0.2s; +} + +.home__button--white { + color: #460064; + background: #EBE5E6; + border: 1px solid #FF20B8; +} + +.home__button---tertiary { + display: inline-flex; + flex-direction: row; + align-items: center; + gap: 4px; + font-weight: 500; + font-size: 16px; + color: #7e7e7e; + background: none; + border: none; + outline: none; + cursor: pointer; + text-decoration: none; + border-radius: 5px; + padding: 0; + transition: background 0.2s; + white-space: nowrap; +} + +.home__button---tertiary svg { + width: 24px; + height: 24px; + display: block; + background-color: #7e7e7e; +} + +.home__button---tertiary:hover, +.home__button---tertiary:focus { + background: #f3f3f3; +} + +.dark .home__button--white { + background: #1e1e1e; + color: #FF20B8; + border-color: #FF20B8; +} + +.dark .home__button---tertiary { + color: #b0b0b0; +} + +.dark .home__button---tertiary svg { + background-color: #b0b0b0; +} + +.dark .home__button---tertiary:hover, +.dark .home__button---tertiary:focus { + background: #333; +} + +/* ==================================== */ +/* FEATURE CARDS +/* ==================================== */ + +.home__feature-card { + position: relative; + border-radius: 12px; + width: 100%; + height: 100%; + background: #faf7f8; + padding: 20px; + max-width: 480px; + transition: box-shadow 0.2s, transform 0.2s, padding 0.2s; +} + +.home__feature-card:hover { + box-shadow: 0 8px 8px rgba(30, 41, 59, 0.15); + transform: translateY(-4px); +} + +.home__feature-card__container { + display: flex; + flex-direction: column; + gap: 16px; + align-items: flex-start; + justify-content: flex-start; + min-width: 0; + width: 100%; + height: 100%; + padding: 0; +} + +.home__feature-card__header { + width: 100%; + position: relative; + display: flex; + flex-direction: column; + gap: 6px; + align-items: flex-start; +} + +.home__feature-card__header-content { + width: 100%; + display: flex; + flex-direction: row; + gap: 10px; + align-items: flex-start; +} + +.home__feature-card__icon { + width: 24px; + height: 24px; + display: flex; + align-items: center; + justify-content: flex-start; + position: relative; +} + +.home__feature-card__svg { + width: 100%; + height: 100%; + display: block; +} + +.home__feature-card__heading { + font-weight: 500; + font-size: 18px; + color: #000; + text-align: left; + letter-spacing: -0.36px; + justify-content: center; + min-height: 0; + min-width: 0; + line-height: 1.19; + text-decoration: none; + margin: 0; + display: flex; + flex-direction: column; +} + +.home__feature-card__description-block { + width: 100%; + position: relative; + display: flex; + flex-direction: row; + align-items: center; + justify-content: center; +} + +.home__feature-card__description { + font-size: 14px; + color: #616161; + text-align: left; + line-height: 1.4; + min-height: 0; + min-width: 0; + flex-grow: 1; + padding-left: 33px; + margin: 0; + white-space: pre-wrap; +} + +.home__feature-card--coming-soon { + cursor: default; +} + +.home__feature-card--coming-soon:hover { + transform: none; + box-shadow: 0 8px 24px rgba(30, 41, 59, 0.12), 0 1.5px 6px rgba(30, 41, 59, 0.08); +} + +.home__feature-card__coming-soon-overlay { + position: absolute; + top: 0; + left: 0; + right: 0; + bottom: 0; + background: rgba(0, 0, 0, 0.6); + backdrop-filter: blur(4px); + -webkit-backdrop-filter: blur(4px); + display: flex; + align-items: center; + justify-content: center; + opacity: 0; + visibility: hidden; + transition: opacity 0.3s ease, visibility 0.3s ease; + border-radius: 12px; + z-index: 10; +} + +.home__feature-card--coming-soon:hover .home__feature-card__coming-soon-overlay { + opacity: 1; + visibility: visible; +} + +.home__feature-card__coming-soon-message { + background: #000; + color: #fff; + padding: 12px 20px; + border-radius: 8px; + font-size: 14px; + font-weight: 600; + letter-spacing: 0.5px; + box-shadow: 0 4px 12px rgba(0, 0, 0, 0.3); + transform: translateY(-2px); + transition: transform 0.2s ease; +} + +.home__feature-card__cloud-feature-tag { + position: absolute; + top: 5px; + right: 5px; + display: flex; + align-items: center; + gap: 0; + padding: 4px 8px; + border-radius: 8px; + background: rgba(255, 32, 184, 0.1); + color: #FF20B8; + font-size: 12px; + font-weight: 700; + letter-spacing: 0.5px; + cursor: pointer; + transition: width 0.3s, background 0.3s, color 0.3s, border-color 0.3s; + overflow: hidden; + z-index: 2; + min-width: 32px; + width: 35px; + height: 20px; + border: 1px solid rgba(255, 32, 184, 0.3); + box-sizing: border-box; + backdrop-filter: blur(4px); + -webkit-backdrop-filter: blur(4px); +} + +.home__feature-card__cloud-feature-tag .cloud-feature-text { + opacity: 0; + max-width: 0; + white-space: nowrap; + transition: opacity 0.3s, max-width 0.3s; + margin-left: 0.5em; +} + +.home__feature-card__cloud-feature-tag:hover, +.home__feature-card__cloud-feature-tag:focus { + background: rgba(255, 32, 184, 0.1); + color: #FF20B8; + border-color: rgba(255, 32, 184, 0.3); + width: 130px; +} + +.home__feature-card__cloud-feature-tag:hover .cloud-feature-text, +.home__feature-card__cloud-feature-tag:focus .cloud-feature-text { + opacity: 1; + max-width: 130px; +} + +.home__feature-card__cloud-feature-tag .cloud-feature-icon { + display: flex; + align-items: center; + justify-content: center; + width: 18px; + height: 18px; +} + +.dark .home__feature-card { + background: #1e1e1e; + box-shadow: 0 2px 8px rgba(0, 0, 0, 0.3); +} + +.dark .home__feature-card:hover { + box-shadow: 0 8px 24px rgba(0, 0, 0, 0.4), 0 1.5px 6px rgba(0, 0, 0, 0.3); +} + +.dark .home__feature-card__heading { + color: #fff; +} + +.dark .home__feature-card__description { + color: #b0b0b0; +} + +.dark .home__feature-card__icon svg { + background-color: #b0b0b0; +} + +.dark .home__feature-card--coming-soon .home__feature-card__coming-soon-overlay { + background: rgba(0, 0, 0, 0.8); +} + +.dark .home__feature-card__cloud-feature-tag { + background: #2d1b24; + color: #FF20B8; + border-color: #4a2635; +} + +.dark .home__feature-card__cloud-feature-tag:hover, +.dark .home__feature-card__cloud-feature-tag:focus { + background: #2d1b24; + color: #FF20B8; + border-color: #4a2635; +} + +/* ==================================== */ +/* FEATURES GRID +/* ==================================== */ + +.home__features-grid { + display: grid; + grid-template-columns: repeat(3, 1fr); + gap: 32px 24px; + width: 100%; + margin: 0 auto 40px auto; + justify-items: stretch; + align-items: stretch; +} + +@media (max-width: 1024px) { + .home__features-grid { + grid-template-columns: repeat(2, 1fr); + } +} + +@media (max-width: 700px) { + .home__features-grid { + grid-template-columns: 1fr; + } +} + +/* ==================================== */ +/* FEATURE HEADER +/* ==================================== */ + +.home__feature-header__container { + position: relative; + width: 100%; + display: flex; + flex-direction: row; + align-items: center; + justify-content: space-between; + border-bottom: 1px solid #dcdcdc; + padding-bottom: 12px; + margin-bottom: 32px; + gap: 24px; +} + +.home__feature-header__heading-block { + display: flex; + flex-direction: column; + justify-content: center; +} + +.home__feature-header__heading { + font-weight: 500; + font-size: 22.6px; + color: #19071f; + margin: 0; + line-height: normal; + letter-spacing: -0.2px; + white-space: pre-line; +} + +.home__feature-header__heading span { + color: #FF20B8; +} + +.home__feature-header__button { + display: flex; + flex-direction: row; + align-items: center; + gap: 4px; + background: none; + border: none; + outline: none; + cursor: pointer; + text-decoration: none; + border-radius: 5px; + padding: 0; + transition: background 0.2s; +} + +.home__feature-header__button-label { + font-weight: 500; + font-size: 16px; + color: #7e7e7e; + white-space: nowrap; +} + +.home__feature-header__button-icon { + display: flex; + align-items: center; + justify-content: center; + width: 24px; + height: 24px; +} + +.home__feature-header__button-icon svg { + background-color: #7e7e7e; +} + +@media (max-width: 748px) { + .home__feature-header__button { + display: none; + } +} + +.dark .home__feature-header__container { + border-bottom-color: #444; +} + +.dark .home__feature-header__heading { + color: #fff; +} + +.dark .home__feature-header__button-label { + color: #b0b0b0; +} + +.dark .home__feature-header__button-icon svg { + background-color: #b0b0b0; +} + +/* ==================================== */ +/* HERO SECTION +/* ==================================== */ + +.home__hero-section { + width: 100%; + background: #f5f2f3; + color: #1f2129; +} + +.dark .home__hero-section { + background: linear-gradient(to right, #1c1f26, #321C35, #1c1f26); + color: white; +} + +.home__hero-layout { + display: flex; + justify-content: center; + align-items: center; +} + +.home__hero-content { + display: flex; + flex-direction: row; + align-items: center; + justify-content: center; + width: 100%; + gap: 64px; +} + +.home__hero-text-block { + display: flex; + flex-direction: column; + align-items: start; + text-align: left; + gap: 24px; + max-width: 512px; +} + +.home__hero-title { + font-weight: 700; + font-size: 36px; + line-height: 1.2; + margin: 0; +} + +.home__hero-subtitle { + font-size: 18px; + margin: 0; + line-height: 1.5; +} + +.home__hero-link { + font-size: 18px; + text-decoration: underline; + text-underline-position: from-font; + text-decoration-thickness: 2px; + text-decoration-color: #FF20B8; + margin-bottom: 8px; + display: inline-block; +} + +.home__hero-buttons { + display: flex; + flex-direction: row; + gap: 24px; + margin-top: 8px; + justify-content: center; + width: 100%; +} + +.home__hero-btn { + font-size: 14px; + font-weight: 700; + border-radius: 999px; + padding: 14px 32px; + text-decoration: none; + display: inline-block; + transition: background 0.2s, color 0.2s, border 0.2s; + cursor: pointer; + line-height: 1; + display: flex; + align-items: center; + justify-content: center; +} + +.home__hero-btn--primary { + background: #FF20B8; + color: #fff; + border: none; +} + +.home__hero-btn--primary:hover { + background: #E01DA5; +} + +.home__hero-btn--secondary { + background: transparent; + color: #fff; + background-color: #3c4051; +} + +.home__hero-btn--secondary:hover { + background: #2d303d; + color: #fff; +} + +.dark .home__hero-btn--secondary { + background: transparent; + color: #fff; + border: 2px solid #fff; +} + +.dark .home__hero-btn--secondary:hover { + background: #EBE5E6; + color: #000000; +} + +.home__hero-image-block { + flex: 1 1 0; + display: flex; + align-items: center; + justify-content: center; + min-width: 0; +} + +.home__hero-image { + max-width: 100%; + height: auto; + max-height: 423px; + border-radius: 12px; +} + +@media (max-width: 1024px) { + .home__hero-content { + flex-direction: column; + gap: 32px; + align-items: center; + } + + .home__hero-image-block { + width: 100%; + justify-content: center; + } + + .home__hero-text-block { + justify-content: center; + align-items: center; + text-align: center; + } +} + +@media (max-width: 700px) { + .home__hero-section { + padding: 32px 0 16px 0; + } + + .home__hero_content { + padding: 0 1rem; + align-items: center; + justify-content: center; + } + + .home__hero-title { + font-size: 32px; + text-align: center; + } + + .home__hero-subtitle { + font-size: 15px; + text-align: center; + } + + .home__hero-btn { + padding: 8px 20px; + font-size: 13px; + } + + .home__hero-image-block { + display: none; + } + + .home__hero-text-block { + align-items: center; + text-align: center; + } +} + +/* ==================================== */ +/* INFO CARDS +/* ==================================== */ +.home__info-card { + position: relative; + border-radius: 12px; + width: 100%; + height: 100%; + padding: 24px; + display: block; + text-decoration: none; + transition: box-shadow 0.2s, transform 0.2s; + background-color: white; +} + +.home__info-card:hover { + box-shadow: 0 8px 8px rgb(34, 31, 32, .1); + transform: translateY(-2px); +} + +.home__info-card__container { + display: flex; + flex-direction: column; + gap: 12px; + width: 100%; +} + +.home__info-card__header { + display: flex; + flex-direction: row; + align-items: center; + justify-content: space-between; + width: 100%; + margin-bottom: 4px; +} + +.home__info-card__label { + font-size: 14px; + font-weight: 600; + border-radius: 6px; + padding: 2px 12px; + display: inline-block; + letter-spacing: 0.5px; + white-space: nowrap; +} + +.home__info-card__label--pink { + background: #FF20B8; + color: #fff; +} + +.home__info-card__label--gray { + background: #f3f3f3; + color: #808080; +} + +.home__info-card__arrow { + display: flex; + align-items: center; + margin-left: 8px; +} + +.home__info-card__title { + font-size: 16px; + font-weight: 600; + color: #000; + margin-bottom: 2px; +} + +.home__info-card__description { + font-size: 14px; + color: #616161; + line-height: 1.4; +} + +.home__info-card__container--horizontal { + gap: 0px; +} + +.dark .home__info-card { + background: #1e1e1e; + border: none; +} + +.dark .home__info-card:hover { + box-shadow: 0 8px 8px rgba(0, 0, 0, 0.15); +} + +.dark .home__info-card__title { + color: #fff; +} + +.dark .home__info-card__description { + color: #b0b0b0; +} + +.dark .home__info-card__label--gray { + background: #333; + color: #b0b0b0; +} + +/* ============================== */ +/* INTEGRATIONS +/* ============================== */ + + + +.home__integration { + position: relative; + width: 60px; + height: 60px; + background: #EBE5E6; + border: 1px solid #E8E8E8; + border-radius: 10px; + display: flex; + align-items: center; + justify-content: center; + cursor: pointer; + transition: transform 0.2s ease, z-index 0s linear 0.2s; + z-index: 0; +} + +.home__integration:hover { + transform: translateY(-2px); + z-index: 100; + transition-delay: 0s; +} + +.home__integration__coming-soon { + position: absolute; + top: 0; + left: 0; + right: 0; + background-color: transparent; + color: rgb(119, 119, 119); + padding: 0px 4px; + border-radius: 4px; + font-size: 8px; + text-align: center; + font-weight: 500; + opacity: 1; + transition: opacity 0.2s ease-in-out; + pointer-events: none; + z-index: 10; + white-space: nowrap; +} + +.home__integration__image { + width: 32px; + height: 32px; + object-fit: contain; +} + +.home__integration__label { + position: absolute; + bottom: -35px; + left: 50%; + transform: translateX(-50%); + text-align: center; + background: #000; + color: #fff; + padding: 6px 10px; + border-radius: 6px; + font-size: 12px; + font-weight: 500; + white-space: nowrap; + opacity: 0; + visibility: hidden; + transition: opacity 0.2s ease, visibility 0.2s ease; + z-index: 30; + /* ensure tooltip is above all integrations */ +} + +.home__integration:hover .home__integration__label { + opacity: 1; + visibility: visible; +} + +/* Add a small arrow pointing up to the integration box */ +.home__integration__label::before { + content: ''; + position: absolute; + top: -4px; + left: 50%; + transform: translateX(-50%); + width: 0; + height: 0; + border-left: 4px solid transparent; + border-right: 4px solid transparent; + border-bottom: 4px solid #000; +} + +.home__integrations-grid { + display: grid; + grid-template-columns: repeat(9, 1fr); + gap: 16px; + justify-items: center; + align-items: center; + width: 100%; + max-width: 1200px; + margin: 0 auto; + position: relative; + z-index: 1; +} + +@media (max-width: 1024px) { + .home__integrations-grid { + display: flex; + flex-wrap: wrap; + } +} + + + +.home__integrations-grid .link { + z-index: auto!important; +} + +.home__integrations-heading { + font-weight: 500; + font-size: 16px; + color: #7e7e7e; + text-align: left; + white-space: nowrap; + margin: 0 0 8px 0; + padding-bottom: 12px; + border-bottom: 1px solid #dedede; + letter-spacing: 0; + line-height: normal; + margin-bottom: 24px; + margin-top: 32px; +} + +.dark .home__integration { + background: #1e1e1e; + border-color: #333; +} + +.dark .home__integration__label { + background: #333; + color: #fff; +} + +.dark .home__integration__label::before { + border-bottom-color: #333; +} + +/* Dark mode logo inversion */ +.dark .home__integration__image[data-invert-on-dark="true"] { + filter: invert(1); +} + +.dark .home__integrations-heading { + color: #b0b0b0; + border-bottom-color: #444; +} + +/* ================================= */ +/* HOME CTA +/* ================================= */ + +.home__cta { + display: flex; + flex-direction: column; + align-items: center; + justify-content: center; + gap: 24px; + text-align: center; + max-width: 600px; + margin: 0 auto; +} + +.home__guides-header { + margin-bottom: 24px; + display: flex; + flex-direction: column; + gap: 8px; + max-width: 600px; +} + +.home__2col-content { + display: flex; + flex-direction: column; + gap: 24px; + max-width: 500px; +} +.home__2col-content.is-centered { + align-self: center; +} + +.home__guides-layout{ + display: flex; + flex-direction: column; + gap: 4rem; +} + + +/* ==================================== */ +/* JUMP NAVIGATION +/* ==================================== */ +.home__jump-navigation { + width: 100%; + display: flex; + justify-content: center; + align-items: center; +} + +.home__jump-navigation-content { + display: flex; + flex-direction: column; + align-items: center; + justify-content: center; + max-width: 1472px; + width: 100%; + gap: 32px; + padding: 0 2rem; +} + +.home__jump-navigation-heading { + font-weight: 600; + font-size: 48px; + color: #1f2129; + text-align: center; + margin: 0; + line-height: 1.25; + letter-spacing: -0.02em; +} + +.home__jump-navigation-links { + display: flex; + flex-direction: row; + align-items: center; + justify-content: center; + gap: 24px; + flex-wrap: wrap; +} + +.home__jump-navigation-label { + font-weight: 500; + font-size: 16px; + color: #7e7e7e; + white-space: nowrap; +} + +.home__jump-navigation-link { + font-weight: 500; + font-size: 16px; + color: #1f2129; + text-decoration: none; + padding: 8px 16px; + border-radius: 8px; + transition: all 0.2s ease; + white-space: nowrap; + border: 1px solid transparent; +} + +.home__jump-navigation-link:hover { + background: #1f2129; + color: #fff; + transform: translateY(-1px); + box-shadow: 0 4px 12px rgba(70, 0, 100, 0.2); +} + +/* Responsive Design */ +@media (max-width: 1024px) { + .home__jump-navigation { + padding: 40px 0; + } + + .home__jump-navigation-content { + gap: 24px; + } + + .home__jump-navigation-heading { + font-size: 32px; + } + + .home__jump-navigation-links { + gap: 10px; + } + .home__jump-navigation-link{ + font-size: 14px; + } +} + +/* Dark Mode Support for Jump Navigation */ + +.dark .home__jump-navigation-heading { + color: #fff; +} + +.dark .home__jump-navigation-label { + color: #b0b0b0; +} + +.dark .home__jump-navigation-link { + color: #FF20B8; + border-color: transparent; +} + +.dark .home__jump-navigation-link:hover { + background: #FF20B8; + color: #000; + box-shadow: 0 4px 12px rgba(255, 32, 184, 0.3); +} + +/* ================================= */ +/* MCP */ +/* ================================= */ +.home__mcp-image { + border: 1px ; + border-radius: 0.75rem; + overflow: hidden; +} + +/* ================================= */ +/* HERO SECURITY & PRIVACY */ +/* ================================= */ + +.home__security { + background: #f2eeef; + display: flex; + padding: 1rem 0rem; + justify-content: center; + align-items: center; + gap: 2rem; +} + +.home__security-info-wrap{ + display: flex; + align-items: center; + gap: 1rem; +} + +.home__security-icon{ + display: flex; + width: 3rem; + height: 3rem; + padding: .5rem; + justify-content: center; + align-items: center; + border-radius: .625rem; + background: #FFF; +} + +.home__security-content-wrap{ + display: flex; + flex-direction: column; + align-items: flex-start; +} + +.home__security-heading{ + color: #1F2129; + font-size: 1rem; + font-style: normal; + font-weight: 600; + line-height: normal; +} + +.home__security-description{ + color: #575A74; + font-size: .7602rem; + font-style: normal; + font-weight: 500; + line-height: normal; +} + +.home__security-separator{ + width: .0625rem; + height: 2.5rem; + background: #EBE5E6; +} + +.home__security-link{ + display: flex; + align-items: center; + gap: .5rem; + color: #FF24B6; + font-size: .7602rem; + font-weight: 500; +} + +.dark .home__security { + background: #2e2e2e; +} + +.dark .home__security-icon { + background: #1e1e1e; +} + +.dark .home__security-heading { + color: #fff; +} + +.dark .home__security-description { + color: #b0b0b0; +} + +.dark .home__security-separator { + background: #444; +} + +.dark .home__security-link { + color: #FF20B8; +} + +/* Responsive Design for Security Section */ +@media (max-width: 620px) { + .home__security { + flex-direction: column; + gap: 1.5rem; + padding: 1.5rem 1rem; + } + + .home__security-info-wrap { + width: 100%; + justify-content: center; + } + + .home__security-separator { + width: 100%; + height: 0.0625rem; + } +} + diff --git a/docs/tutorial/adding-elementary-tests.mdx b/docs/tutorial/adding-elementary-tests.mdx index 9c55d9e74..98169074c 100644 --- a/docs/tutorial/adding-elementary-tests.mdx +++ b/docs/tutorial/adding-elementary-tests.mdx @@ -29,7 +29,7 @@ models: description: This table has basic information about a customer, as well as some derived facts based on a customer's orders config: tags: ["PII"] - tests: + data_tests: - elementary.volume_anomalies ``` @@ -43,7 +43,7 @@ models: tags: ["PII"] elementary: timestamp_column: "signup_date" - tests: + data_tests: - elementary.volume_anomalies ``` @@ -59,10 +59,12 @@ Similar to Test 1, we will use the **volume_anomalies** test to detect an anomal config: tags: ["finance"] - tests: + data_tests: - elementary.volume_anomalies: - tags: ["table_anomalies"] - timestamp_column: "order_date" + config: + tags: ["table_anomalies"] + arguments: + timestamp_column: "order_date" ```` @@ -75,10 +77,11 @@ The **elementary.dimension_anomalies** tests can be used to check for anomalies config: tags: ["finance"] - tests: + data_tests: - elementary.dimension_anomalies: - dimensions: - - status + arguments: + dimensions: + - status ```` Next, we will define a timestamp column for determining time buckets: @@ -91,10 +94,11 @@ Next, we will define a timestamp column for determining time buckets: elementary: timestamp_column: "order_date" - tests: + data_tests: - elementary.dimension_anomalies: - dimensions: - - status + arguments: + dimensions: + - status ``` This test will gather row count metrics for each value in the **status** column and will fail if the distribution deviates substantially from the mean. @@ -106,14 +110,15 @@ We will use the **elementary.column_anomalies** test to monitor the count of cus ```yaml - name: number_of_orders description: Count of the number of orders a customer has placed - tests: + data_tests: - elementary.column_anomalies: config: severity: warn - tags: ["column_anomalies"] - column_anomalies: - - zero_count - timestamp_column: "signup_date" + tags: ["column_anomalies"] + arguments: + column_anomalies: + - zero_count + timestamp_column: "signup_date" ``` Notice that we already defined the **timestamp_column** at the model level, so we don't have to define it in the test. [This page](/data-tests/elementary-tests-configuration) has in-depth details on test priorities. @@ -133,7 +138,7 @@ models: tags: ["PII"] elementary: timestamp_column: "signup_date" - tests: + data_tests: - elementary.volume_anomalies columns: @@ -154,14 +159,15 @@ models: - name: number_of_orders description: Count of the number of orders a customer has placed - tests: + data_tests: - elementary.column_anomalies: config: severity: warn - tags: ["column_anomalies"] - column_anomalies: - - zero_count - timestamp_column: "signup_date" + tags: ["column_anomalies"] + arguments: + column_anomalies: + - zero_count + timestamp_column: "signup_date" - name: customer_lifetime_value description: Total value (AUD) of a customer's orders @@ -179,10 +185,11 @@ models: elementary: timestamp_column: "order_date" - tests: + data_tests: - elementary.dimension_anomalies: - dimensions: - - status + arguments: + dimensions: + - status columns: - name: order_id description: This is a unique identifier for an order @@ -216,10 +223,12 @@ models: config: tags: ["finance"] - tests: + data_tests: - elementary.volume_anomalies: - tags: ["table_anomalies"] - timestamp_column: "order_date" + config: + tags: ["table_anomalies"] + arguments: + timestamp_column: "order_date" columns: - name: order_id diff --git a/docs/tutorial/installing-elementary.mdx b/docs/tutorial/installing-elementary.mdx index 9dd9cc06d..c6cedc839 100644 --- a/docs/tutorial/installing-elementary.mdx +++ b/docs/tutorial/installing-elementary.mdx @@ -3,6 +3,10 @@ title: "Installing Elementary dbt Package" sidebarTitle: "Install dbt package" --- - +import QuickstartPackageInstall from '/snippets/quickstart-package-install.mdx'; + + + + **Now, let's take a look at the artifacts that Elementary creates in our DWH.** diff --git a/docs/tutorial/running-elementary.mdx b/docs/tutorial/running-elementary.mdx index 709c6e74d..64f9ab853 100644 --- a/docs/tutorial/running-elementary.mdx +++ b/docs/tutorial/running-elementary.mdx @@ -3,6 +3,12 @@ title: "Running Elementary Tests and Generate Reports" sidebarTitle: "Generate report" --- +import InstallCli from '/snippets/install-cli.mdx'; +import AddConnectionProfile from '/snippets/add-connection-profile.mdx'; +import SupportContact from '/snippets/support-contact.mdx'; + + + ## Elementary CLI Setup The CLI tool allows users to generate reports and visuallize results based on the tests we just configured. While we have already added the Elementary packages to our dbt project, the Elementary CLI requires a separate installation, as it is a Python tool. @@ -15,11 +21,11 @@ The CLI tool allows users to generate reports and visuallize results based on th Run one of the following commands based on your platform: - +
- +
## Running Elementary @@ -68,4 +74,4 @@ edr report ## Congratulations! Congratulations, you successfully configured Elementary's tests and ran the report! - + diff --git a/docs/tutorial/viewing-artifacts.mdx b/docs/tutorial/viewing-artifacts.mdx index 666369c55..22fe26d5c 100644 --- a/docs/tutorial/viewing-artifacts.mdx +++ b/docs/tutorial/viewing-artifacts.mdx @@ -17,6 +17,6 @@ To see these artifacts, we can navigate to our tutorial’s database; here, we w These artifacts are used by Elementary tests and the Elementary CLI and will be updated when we run **dbt test**, **dbt run**, and **dbt build**. -For more information about Elementary's dbt artifacts [click here](/dbt/dbt-artifacts). +For more information about Elementary's dbt artifacts [click here](/data-tests/dbt/dbt-artifacts). **Next, let’s see how we add Elementary's tests and anomaly detection to our data.** diff --git a/docs/x_old/understand-elementary/elementary-overview.mdx b/docs/x_old/understand-elementary/elementary-overview.mdx index 9ab8608e5..9b0ff7ef3 100644 --- a/docs/x_old/understand-elementary/elementary-overview.mdx +++ b/docs/x_old/understand-elementary/elementary-overview.mdx @@ -10,19 +10,19 @@ Our goal is to provide data teams **immediate visibility**, **detection of data Set up monitoring for your warehouse in minutes, collect test results and data quality metrics, detect data issues before your users do. -Data monitoring includes a [dbt package](/guides/modules-overview/dbt-package) and the edr CLI for generating the report. +Data monitoring includes a [dbt package](/data-tests/dbt/dbt-package) and the edr CLI for generating the report. ## Data anomalies detection as dbt tests Continuous monitoring of data quality metrics, freshness, volume and schema changes, including anomaly detection, configured and executed as dbt tests. -Data anomalies tests are included in the [dbt package](/guides/modules-overview/dbt-package). +Data anomalies tests are included in the [dbt package](/data-tests/dbt/dbt-package). ## dbt artifacts uploader Monitor the operations of your dbt easily. Collect dbt artifacts, run and test results as part of your runs. -The dbt artifacts uploader is included in the [dbt package](/guides/modules-overview/dbt-package). +The dbt artifacts uploader is included in the [dbt package](/data-tests/dbt/dbt-package). ## [Failure alerts](/oss/guides/alerts/elementary-alerts) diff --git a/docs/x_old/understand-elementary/elementary-report-ui.mdx b/docs/x_old/understand-elementary/elementary-report-ui.mdx index 2498bb3a9..adfdbc4fe 100644 --- a/docs/x_old/understand-elementary/elementary-report-ui.mdx +++ b/docs/x_old/understand-elementary/elementary-report-ui.mdx @@ -3,12 +3,12 @@ title: "Data observability report" --- Elementary has a UI for visualization and exploration of data from -the [dbt-package](/guides/modules-overview/dbt-package) tables, which includes dbt +the [dbt-package](/data-tests/dbt/dbt-package) tables, which includes dbt test results, Elementary anomaly detection results, dbt artifacts, etc. In order to visualize the data from -the [dbt-package](/general/contributions#contributing-to-the-dbt-package) tables, use -the [CLI](/understand-elementary/cli-install) you can generate the Elementary UI. +the [dbt-package](/oss/general/contributions#contributing-to-the-dbt-package) tables, use +the [CLI](/oss/cli-install) you can generate the Elementary UI. After installing and configuring the CLI, execute the command: ```shell @@ -111,4 +111,4 @@ test failure, for example on an exposure or a dashboard in the data stack. The data tests report UI can be sent via Slack, Google Cloud Storage, or Amazon S3 when you run `edr send-report`. -Refer to [this guide](/quickstart/share-report-ui) for detailed instructions. +Refer to [this guide](/oss/guides/share-report-ui) for detailed instructions.