| title | Databricks Certification Study Guide | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| type | project | ||||||||||
| tags |
|
A community-maintained study guide covering every active Databricks certification.
Refreshed against the official Databricks exam guides current as of May 21, 2026.
Note
These are the exam notes I built while preparing for the Databricks certification series. Cheat sheets, practice questions, mock exams, and end-to-end worked examples β now open-sourced under MIT so the next person doesn't have to start from scratch. Every certification section maps to the current official Databricks exam guide for that credential. If it helped you, β the repo and pass it on.
Tip
βΆ Try the practice quiz live β adaptive multiple-choice quiz, 931 questions across 18 banks (6 practice + 12 mock exams, all 6 cert tracks). Runs in your browser, progress saved locally. No install, no signup. Exam timer with pause / resume, completion summary with per-module breakdown + weak-area highlight, attempt history, HTML / CSV export, keyboard-first navigation, light / dark / auto theme, WCAG AAA contrast.
[!info] Read in another language: πΉπ ΰΈ ΰΈ²ΰΈ©ΰΈ²ΰΉΰΈΰΈ’ (in progress) Β· For other languages and the translation policy, see
TRANSLATING.mdandi18n/README.md.
- Why this guide exists
- Who this is for
- The six certifications at a glance
- Per-certification exam details
- What changed in the 2025β2026 exam guides
- Getting started with Obsidian
- How to use this guide
- Study roadmaps β 4-week, 8-week, and 12-week plans per cert tier
- Certification paths
- Roadmap for the guide itself
- Repository layout
- Official Databricks resources
- Contributing
- License
Databricks' certification line-up grew fast: six active credentials across data engineering, analytics, ML, and generative AI β each with its own evolving exam guide. The official skills lists change quarterly. Public study resources are scattered across blog posts, training courses, and community forums.
This repo is the consolidated notes I wrote while preparing for the series. Now it's yours.
- Data engineers preparing for the Data Engineer Associate or Professional
- Analysts and BI developers preparing for the Data Analyst Associate
- ML engineers and data scientists preparing for the ML Associate or Professional
- AI/LLM application developers preparing for the Generative AI Engineer Associate
- Anyone building on the Databricks Data Intelligence Platform who wants a structured reference for Delta Lake, Unity Catalog, Lakeflow, MLflow, Feature Store, Mosaic AI, and Vector Search
You don't need to be taking an exam to get value β the guide doubles as a reference for the platform itself.
| Certification | Tier | Fee | Duration | Scored Qs | Exam Guide |
|---|---|---|---|---|---|
| Data Engineer Associate | Associate | $200 | 90 min | 45 | May 2026 |
| Data Engineer Professional | Professional | $200 | 120 min | 59 | Nov 30, 2025 |
| Data Analyst Associate | Associate | $200 | 90 min | 45 | Oct 2025 |
| ML Associate | Associate | $200 | 90 min | 48 | Mar 1, 2025 |
| ML Professional | Professional | $200 | 120 min | 59 | Sep 2025 |
| GenAI Engineer Associate | Associate | $200 | 90 min | 45 | Mar 2026 |
All six certifications:
- Pass/fail. Databricks does not publish a numeric passing threshold.
- Multiple choice. Online or on-site delivery.
- Validity: 2 years. Renew by retaking or sitting an updated version.
- No formal prerequisites. Hands-on experience and the Databricks Academy course are strongly recommended.
- Languages available: English on every exam. Japanese, Portuguese (BR), and Korean are available on every associate exam except Data Analyst Associate (English-only), and on the Data Engineer Professional. The ML Professional and Data Analyst Associate exams are English-only.
π Data Engineer Associate β May 2026 blueprint
| Detail | Value |
|---|---|
| Exam ID / page | Databricks Certified Data Engineer Associate |
| Exam guide PDF | May 2026 exam guide |
| Duration | 90 minutes |
| Scored questions | 45 multiple-choice |
| Languages | English, Japanese, Portuguese (BR), Korean |
| Recommended experience | 6+ months hands-on with Databricks |
The May 2026 refresh replaces legacy "Workflows" terminology with Lakeflow Jobs, and adds explicit coverage of ingestion patterns, modelling, and CI/CD. Existing topic folders in this guide preserve the prior structure; see the DE Associate README for the up-to-date domain list.
π Data Engineer Professional β November 30, 2025 blueprint
| Detail | Value |
|---|---|
| Exam ID / page | Databricks Certified Data Engineer Professional |
| Duration | 120 minutes |
| Scored questions | 59 multiple-choice |
| Languages | English, Japanese, Portuguese (BR), Korean |
| Recommended experience | 1+ years building production data pipelines on Databricks |
Ten domains in the current exam guide:
| Domain | Weight |
|---|---|
| Developing Code for Data Processing | 22 % |
| Cost & Performance Optimization | 13 % |
| Data Transformation, Cleansing, and Quality | 10 % |
| Monitoring and Alerting | 10 % |
| Ensuring Data Security and Compliance | 10 % |
| Debugging and Deploying | 10 % |
| Data Ingestion & Acquisition | 7 % |
| Data Governance | 7 % |
| Data Modelling | 6 % |
| Data Sharing and Federation | 5 % |
π Data Analyst Associate β October 2025 blueprint
| Detail | Value |
|---|---|
| Exam ID / page | Databricks Certified Data Analyst Associate |
| Duration | 90 minutes |
| Scored questions | 45 multiple-choice |
| Languages | English |
| Recommended experience | 6+ months hands-on with Databricks SQL |
Nine domains in the current exam guide, including the new AI/BI Genie Spaces section:
| Domain | Weight |
|---|---|
| Executing Queries Using Databricks SQL and Warehouses | 20 % |
| Creating Dashboards and Visualizations | 16 % |
| Analyzing Queries | 15 % |
| Developing, Sharing, and Maintaining AI/BI Genie Spaces | 12 % |
| Understanding Databricks Data Intelligence Platform | 11 % |
| Managing Data | 8 % |
| Securing Data | 8 % |
| Importing Data | 5 % |
| Data Modeling with Databricks SQL | 5 % |
π ML Associate β March 1, 2025 blueprint
| Detail | Value |
|---|---|
| Exam ID / page | Databricks Certified Machine Learning Associate |
| Duration | 90 minutes |
| Scored questions | 48 multiple-choice |
| Languages | English, Japanese, Portuguese (BR), Korean |
| Recommended experience | 6+ months hands-on ML on Databricks |
| Domain | Weight |
|---|---|
| Databricks Machine Learning | 38 % |
| Model Development | 31 % |
| ML Workflows | 19 % |
| Model Deployment | 12 % |
Code is Python; non-ML supporting tasks may include SQL.
π ML Professional β September 2025 blueprint
| Detail | Value |
|---|---|
| Exam ID / page | Databricks Certified Machine Learning Professional |
| Duration | 120 minutes |
| Scored questions | 59 multiple-choice |
| Languages | English |
| Recommended experience | 1+ years building enterprise-scale ML on Databricks |
| Domain | Weight |
|---|---|
| Model Development | 44 % |
| ML Ops | 44 % |
| Model Deployment | 12 % |
The current exam guide consolidates the previous four domains into three; "Solution & Data Monitoring" responsibilities now sit inside ML Ops.
π GenAI Engineer Associate β March 2026 blueprint
| Detail | Value |
|---|---|
| Exam ID / page | Databricks Certified Generative AI Engineer Associate |
| Duration | 90 minutes |
| Scored questions | 45 multiple-choice |
| Languages | English, Japanese, Portuguese (BR), Korean |
| Recommended experience | 6+ months hands-on building GenAI solutions on Databricks |
Six domains in the current exam guide:
| Domain | Weight |
|---|---|
| Application Development | 30 % |
| Assembling and Deploying Apps | 22 % |
| Design Applications | 14 % |
| Data Preparation | 14 % |
| Evaluation and Monitoring | 12 % |
| Governance | 8 % |
Important
Several Databricks exam guides were updated in 2025β2026. Highlights to know:
- Data Engineer Associate (May 2026) β terminology shifted from "Workflows" to Lakeflow Jobs; the guide also clarifies expectations around Delta ingestion patterns and CI/CD with Databricks Asset Bundles. exam guide PDF
- Data Engineer Professional (Nov 30, 2025) β restructured to 10 domains with explicit weights; Data Sharing and Federation (Delta Sharing, Lakehouse Federation) and Data Modelling are now first-class domains.
- Data Analyst Associate (Oct 2025) β added the AI/BI Genie Spaces domain (12 %); SQL warehouses + query analysis remain the highest-weight blocks.
- ML Associate (Mar 1, 2025) β current blueprint emphasises Databricks ML (Unity Catalog, AutoML, Mosaic AI) at 38 % and dedicates 31 % to Model Development.
- ML Professional (Sep 2025) β collapsed to three domains: Model Development, ML Ops, and Model Deployment (44 / 44 / 12).
- GenAI Engineer Associate (Mar 2026) β refactored into six domains with Application Development (30 %) as the largest block; Governance is now an explicit (8 %) domain covering Unity Catalog for AI assets, PII handling, and content safety.
All passing-score references in older study materials should be ignored: Databricks no longer publishes a numeric passing threshold for any of these certifications. Treat the exam as pass/fail and aim to be comfortable across every domain, not just the high-weight ones.
This guide is written in Obsidian Flavored Markdown. It renders fine on GitHub, but in Obsidian you get callouts, foldable practice-question answers, Mermaid diagrams, backlinks, and a navigable Graph View of every cross-link β which makes studying meaningfully better.
-
Install Obsidian (free; macOS, Windows, Linux).
-
Clone this repo somewhere on your machine:
git clone https://github.com/kengio/databricks-certification-study-guide.git cd databricks-certification-study-guide -
Open the vault: launch Obsidian β Open folder as vault β pick the cloned
databricks-certification-study-guide/directory. -
Trust the author when Obsidian asks (the included
.obsidian/config has pre-tuned settings β line numbers, tab width, no inline titles). -
Open the certification you're targeting β e.g.,
certifications/data-engineer-associate/README.md. PressCmd/Ctrl + Oto fuzzy-find any topic. -
Toggle Graph View (
Cmd/Ctrl + G) to see how topics cross-link β surprisingly useful for spotting weak areas.
Two are essential, the rest are quality-of-life. Install via Settings β Community plugins β Browse.
- Obsidian Git β back up your notes and progress checkboxes to your own fork.
- Linter β keeps your edits consistent with the project's markdown conventions.
- Advanced Tables β auto-aligns Markdown tables as you type.
- Codeblock Customizer (or Better CodeBlock) β line numbers, titles, copy buttons on code blocks.
- Copilot (by logancyang) β chat with Claude / GPT-4o / Ollama inside Obsidian; Vault QA mode indexes the guide so the AI can quiz you using your actual notes.
π See
OBSIDIAN-SETUP.mdfor the full setup walkthrough β plugin configuration details, Copilot Vault QA setup, recommended study prompts, and tips for using AI to generate active-recall questions from your notes.
No problem. The guide also renders perfectly in:
- GitHub β browse the files online; callouts and Mermaid diagrams render natively
- VS Code with the Markdown All in One extension
- Any Markdown reader that supports GFM β you'll lose callouts and Graph View, but the content is fully readable
- Pick a certification. Start from the six-certs table above, open that cert's
README.md. Each one has a domain-weighted study plan and a progress tracker. - Review shared fundamentals first β
shared/fundamentals/covers the cross-cert concepts (platform architecture, Delta Lake basics, Spark fundamentals, Unity Catalog basics, medallion architecture, MLflow basics, RAG/vector-search basics). - Work through the cert's numbered topic folders in order. Each topic file is 300β600 lines with examples, comparison tables, common-mistake callouts, and exam tips.
- Use the cheat sheets for fast review after each domain.
- Hit the practice questions for your cert. Aim for 70 %+ per domain before moving on.
- Sit the mock exams under timed conditions when you think you're close.
- Reference the interview prep if you're using the cert as a stepping stone to a role change β those questions explore system-design depth beyond the exam.
- Run the hands-on lab pack to exercise the core features end-to-end in a Databricks workspace β labs cover medallion ingestion, UC governance, Lakeflow Declarative Pipelines, MLflow + Model Registry, and a Mosaic AI RAG demo.
Pick the plan that matches the time you have. The hour estimates assume an experienced practitioner brushing up on Databricks specifics; double them if you're newer to Spark or the lakehouse pattern. Each plan ends with the same outcome β sitting the exam with confidence.
| Week | Focus | Hours |
|---|---|---|
| 1 | Shared fundamentals + first half of the cert's topic folders | 7 |
| 2 | Second half of the cert's topic folders | 7 |
| 3 | All cheat sheets + per-domain practice questions (target 70 %+ each) | 8 |
| 4 | Mock Exam 1 (timed) β review β Mock Exam 2 β interview-prep spot-checks β exam | 6 |
| Week | Focus | Hours |
|---|---|---|
| 1 | Read shared fundamentals + cert's first topic folder | 6 |
| 2 | Cert topic folders 2β3 | 6 |
| 3 | Cert topic folders 4βend | 6 |
| 4 | Checkpoint: cheat sheets + practice questions for domains 1β2 | 6 |
| 5 | Practice questions for remaining domains; revisit weak topics | 7 |
| 6 | Mock Exam 1 (timed) + per-question debrief | 6 |
| 7 | Gap-fill from Mock 1 debrief; second pass on weakest cheat sheet | 5 |
| 8 | Mock Exam 2 + final review β exam | 5 |
| Week | Focus | Hours |
|---|---|---|
| 1 | Shared fundamentals + first 25 % of cert's topic folders | 10 |
| 2 | Topic folders 25β50 % | 10 |
| 3 | Topic folders 50β75 % | 10 |
| 4 | Topic folders 75β100 % + appendix entries for trickiest topics | 10 |
| 5 | All cheat sheets + Practice questions, domains 1β5 | 10 |
| 6 | Practice questions, remaining domains; revisit cross-cutting concerns (security, perf, deployment) | 10 |
| 7 | Mock Exam 1 (timed) β debrief β topic-file deep-dive on every miss | 10 |
| 8 | Mock Exam 2 β final review β interview-prep spot-checks β exam | 10 |
Best for: newcomers to the professional-tier scope, candidates targeting a role change, or anyone wanting deeper retention. Spread the 8-week plan above across 12 weeks with extra time on the highest-weight domains and add a hands-on lab week per quarter of the syllabus.
Weekday (45β60 min): Read 1 topic sub-file + run the examples in a Databricks workspace
Weekend (90β120 min): Cheat sheet review + practice questions
Pre-exam (last 3 days): Stop new material. Re-read cheat sheets only.
Exam day: Read the cheat sheet stack once over coffee. Eat. Go pass it.
graph TD
DAA[Data Analyst Associate] --> DEA[Data Engineer Associate]
DEA --> DEP[Data Engineer Professional]
DEA --> MLA[ML Associate]
MLA --> MLP[ML Professional]
MLA --> GAIA[GenAI Engineer Associate]
DEP --> GAIA
See Learning Paths for role-specific progression guides (Data Engineer, Analytics Engineer, ML Engineer, AI Engineer).
This guide ships as a living resource. The roadmap below is what's planned for the next two quarters β issues and PRs against any of these are welcome.
- β MIT license + open-source release β complete
- β Refresh all 6 cert READMEs to current exam-guide versions β complete
- β 2025β2026 update callouts in the top-level README and per-cert READMEs β complete
- β Topic-folder reorganisation for DE Professional to match the November 2025 10-domain structure β complete
- β Topic-folder reorganisation for Data Analyst Associate to surface AI/BI Genie Spaces as its own folder β complete
- β Topic-folder reorganisation for GenAI Engineer Associate to match the March 2026 6-domain structure β complete
- β Refreshed mock exams for the three certifications above β complete (terminology + new-domain questions for Data Sharing/Federation, AI/BI Genie Spaces, and GenAI Governance)
- β
Hands-on lab pack β runnable PySpark / SQL / dbutils scripts that walk through medallion ingestion, Unity Catalog setup, Lakeflow Pipelines, MLflow tracking, and a Mosaic AI Vector Search RAG demo β complete (see
labs/)
- β DE Associate topic-folder refresh for the May 2026 Lakeflow Jobs terminology β complete (added CI/CD and Monitoring folder, refreshed terminology)
- β ML Associate / ML Professional folder reorg to match the Mar 2025 (4-domain) and Sep 2025 (3-domain) blueprints β complete
- β Per-cert "final review" file (20-minute exam-morning scan) modelled on the dp-800 pattern β complete (6 files, one per cert)
- β Per-cert mock-exam debrief files mapping every question to a topic file and cheat sheet β complete (12 files, one per mock)
- β
CONTRIBUTORS.md listing community contributors β complete (see
CONTRIBUTORS.md)
- β
Renewal guide for candidates whose 2-year validity expires in 2027β2028 β complete (see
shared/appendix/renewal-guide.md) - β
Translation scaffolding β Thai in-tree, other languages via fork model β complete (see
TRANSLATING.md+i18n/) - β
Spaced-repetition deck (Anki) β markdown-source decks + stdlib-only builder, two starter decks (Delta Lake 27 cards, Unity Catalog 22 cards) β complete (see
anki/README.md) - β
Adaptive practice questions β static-only web quiz with localStorage progress tracking + adaptive selector. 18 banks / 931 questions (6 practice topic banks + 12 mock exams, all 6 cert tracks). Live at kengio.github.io/databricks-certification-study-guide β complete (see
practice/README.md)
Legend: β done Β· π in progress / next up Β· β³ planned Β· π± ideas being explored
databricks-certification-study-guide/
βββ certifications/
β βββ data-engineer-associate/ # 6 topic folders + resources/ (May 2026 blueprint refresh)
β βββ data-engineer-professional/ # 10 topic folders + resources/ (Nov 2025 blueprint)
β βββ data-analyst-associate/ # 9 topic folders + resources/ (Oct 2025 blueprint)
β βββ ml-associate/ # 4 topic folders + resources/ (Mar 2025 blueprint)
β βββ ml-professional/ # 3 topic folders + resources/ (Sep 2025 blueprint)
β βββ genai-engineer-associate/ # 6 topic folders + resources/ (Mar 2026 blueprint)
βββ shared/
β βββ fundamentals/ # cross-cert concepts (platform, Delta, Spark, UC, Medallion, MLflow, RAG)
β βββ cheat-sheets/ # quick-reference sheets (Delta, DLT, MLflow, PySpark, SQL, UC, performance)
β βββ appendix/ # glossary, comparison tables, error reference
β βββ code-examples/ # python/ and sql/ (always .md files)
β βββ interview-prep/ # 15 topic files, 108 open-ended design questions
βββ learning-paths/ # role-based progression guides
βββ labs/ # runnable hands-on lab pack (5 labs)
βββ i18n/ # translations index + th/ (Thai in-tree translation)
βββ anki/ # spaced-repetition decks (markdown source + stdlib builder)
βββ practice/ # static adaptive practice quiz (HTML/CSS/JS + JSON banks)
βββ images/databricks-ui/ # screenshots organised by feature area
βββ CLAUDE.md # repo conventions for AI assistants
βββ OBSIDIAN-SETUP.md # full Obsidian onboarding walkthrough
βββ CONTRIBUTING.md # ground rules + 3-round review workflow
βββ TRANSLATING.md # translation policy + Thai contribution flow
βββ CHANGELOG.md # release history per exam-guide version
βββ LICENSE # MIT
βββ README.md # this file
π Certifications and exam scheduling
- Databricks Certifications hub β landing page with links to every active exam
- Data Engineer Associate
- Data Engineer Professional
- Data Analyst Associate
- Machine Learning Associate
- Machine Learning Professional
- Generative AI Engineer Associate
- Databricks Academy β free + paid courses mapped to each certification
π Platform documentation
- Databricks documentation
- Delta Lake documentation
- Apache Spark documentation
- PySpark API reference
- Unity Catalog
- Lakeflow Jobs (formerly Workflows)
- Lakeflow Declarative Pipelines (formerly DLT)
- Databricks SQL
- AI/BI Genie
- Mosaic AI Vector Search
- Mosaic AI Model Serving
- MLflow on Databricks
- Feature Engineering in Unity Catalog
- Databricks Asset Bundles
- Delta Sharing
- Lakehouse Federation
π₯ Community and learning
- Databricks Community β forums for exam policy, study strategy, and platform Q&A
- Databricks Blog β product launches and best practices
- Databricks YouTube channel
- Databricks Data + AI Summit recordings
- Awesome Databricks (community list)
Found an error, a stale link, or a topic that needs deeper coverage? PRs are welcome.
- Small fixes (typos, link rot, factual corrections) β open a PR directly
- New practice questions or topic expansions β open an issue first to discuss scope
- Blueprint changes β Databricks updates the exam guides periodically; PRs that bring sections in line with the latest exam-guide PDF for a certification are especially appreciated
Every PR goes through a 3-round self-review (technical, factual, style) before squash-merge. The PR template walks you through the checklist. Full details in CONTRIBUTING.md.
Contributors are credited in CONTRIBUTORS.md β your PR description can ask to be listed (or to be skipped if you'd prefer to stay anonymous).
Conventions live in CLAUDE.md.
Released under the MIT License. Use, fork, remix, redistribute β just keep the copyright notice.
This guide is a community resource. It is not affiliated with, endorsed by, or sponsored by Databricks.
"Databricks", "Delta Lake", "Lakeflow", "MLflow", "Unity Catalog", and "Mosaic AI" are trademarks of Databricks Inc.
Always verify against the current official Databricks exam guides β they are the source of truth.
Good luck on your exam. You've got this. β this repo if it helped you pass.