| title | type | layout | markup |
|---|---|---|---|
Profiling Guide |
page |
single |
markdown |
Profiling is how hidden risks become visible. In Data Trust Engineering (DTE), profiling is not a one‑off audit—it’s a lightweight, continuous signal that feeds confidence indicators and trust dashboards.
# profiling.py
import pandas as pd
from ydata_profiling import ProfileReport
# Load dataset
df = pd.read_csv("user_data.csv")
# Generate profile
profile = ProfileReport(df, title="User Data Profiling")
profile.to_file("artifacts/profiling/user_data_profile.html")
print("Profile written to artifacts/profiling/user_data_profile.html")Run:
python profiling.py- Derive trust metrics from profiling: completeness %, duplication %, freshness.
- Publish indicators alongside dashboards.
- Run profiling as part of CI/CD (sampled data).
- Store artifacts (
.html,.json) in evidence packs.
- Use profiles as reference points for drift detection.
- Compare current vs. baseline profiles.
- Track only a few critical metrics: null %, cardinality, skewness.
- Avoid full audits in every run.
- Completeness % — non‑null values per column.
- Duplication % — duplicate rows or keys.
- Freshness — max timestamp age.
- Skewness — distribution anomalies.
Profiling feeds confidence indicators, enabling teams to see problems before they erode trust.