Skip to content

kengio/databricks-certification-study-guide

Repository files navigation

title Databricks Certification Study Guide
type project
tags
databricks
certification
study-guide
data-engineering
data-analytics
machine-learning
generative-ai
delta-lake
unity-catalog
mlflow

Databricks Certified Study Guide

Databricks Certification Study Guide

Open-source community study guide for all six Databricks certifications

License: MIT Databricks Certifications Refreshed: May 2026 Delta Lake Apache Spark MLflow
Unity Catalog Lakeflow Mosaic AI PySpark Obsidian compatible PRs welcome

A community-maintained study guide covering every active Databricks certification.
Refreshed against the official Databricks exam guides current as of May 21, 2026.

Note

These are the exam notes I built while preparing for the Databricks certification series. Cheat sheets, practice questions, mock exams, and end-to-end worked examples β€” now open-sourced under MIT so the next person doesn't have to start from scratch. Every certification section maps to the current official Databricks exam guide for that credential. If it helped you, ⭐ the repo and pass it on.

Tip

β–Ά Try the practice quiz live β€” adaptive multiple-choice quiz, 931 questions across 18 banks (6 practice + 12 mock exams, all 6 cert tracks). Runs in your browser, progress saved locally. No install, no signup. Exam timer with pause / resume, completion summary with per-module breakdown + weak-area highlight, attempt history, HTML / CSV export, keyboard-first navigation, light / dark / auto theme, WCAG AAA contrast.

[!info] Read in another language: πŸ‡ΉπŸ‡­ ΰΈ ΰΈ²ΰΈ©ΰΈ²ΰΉ„ΰΈ—ΰΈ’ (in progress) Β· For other languages and the translation policy, see TRANSLATING.md and i18n/README.md.


Contents


Why this guide exists

Databricks' certification line-up grew fast: six active credentials across data engineering, analytics, ML, and generative AI β€” each with its own evolving exam guide. The official skills lists change quarterly. Public study resources are scattered across blog posts, training courses, and community forums.

This repo is the consolidated notes I wrote while preparing for the series. Now it's yours.

Who this is for

  • Data engineers preparing for the Data Engineer Associate or Professional
  • Analysts and BI developers preparing for the Data Analyst Associate
  • ML engineers and data scientists preparing for the ML Associate or Professional
  • AI/LLM application developers preparing for the Generative AI Engineer Associate
  • Anyone building on the Databricks Data Intelligence Platform who wants a structured reference for Delta Lake, Unity Catalog, Lakeflow, MLflow, Feature Store, Mosaic AI, and Vector Search

You don't need to be taking an exam to get value β€” the guide doubles as a reference for the platform itself.

The six certifications at a glance

Certification Tier Fee Duration Scored Qs Exam Guide
Data Engineer Associate Associate $200 90 min 45 May 2026
Data Engineer Professional Professional $200 120 min 59 Nov 30, 2025
Data Analyst Associate Associate $200 90 min 45 Oct 2025
ML Associate Associate $200 90 min 48 Mar 1, 2025
ML Professional Professional $200 120 min 59 Sep 2025
GenAI Engineer Associate Associate $200 90 min 45 Mar 2026

All six certifications:

  • Pass/fail. Databricks does not publish a numeric passing threshold.
  • Multiple choice. Online or on-site delivery.
  • Validity: 2 years. Renew by retaking or sitting an updated version.
  • No formal prerequisites. Hands-on experience and the Databricks Academy course are strongly recommended.
  • Languages available: English on every exam. Japanese, Portuguese (BR), and Korean are available on every associate exam except Data Analyst Associate (English-only), and on the Data Engineer Professional. The ML Professional and Data Analyst Associate exams are English-only.

Per-certification exam details

πŸ“Š Data Engineer Associate β€” May 2026 blueprint
Detail Value
Exam ID / page Databricks Certified Data Engineer Associate
Exam guide PDF May 2026 exam guide
Duration 90 minutes
Scored questions 45 multiple-choice
Languages English, Japanese, Portuguese (BR), Korean
Recommended experience 6+ months hands-on with Databricks

The May 2026 refresh replaces legacy "Workflows" terminology with Lakeflow Jobs, and adds explicit coverage of ingestion patterns, modelling, and CI/CD. Existing topic folders in this guide preserve the prior structure; see the DE Associate README for the up-to-date domain list.

πŸ“Š Data Engineer Professional β€” November 30, 2025 blueprint
Detail Value
Exam ID / page Databricks Certified Data Engineer Professional
Duration 120 minutes
Scored questions 59 multiple-choice
Languages English, Japanese, Portuguese (BR), Korean
Recommended experience 1+ years building production data pipelines on Databricks

Ten domains in the current exam guide:

Domain Weight
Developing Code for Data Processing 22 %
Cost & Performance Optimization 13 %
Data Transformation, Cleansing, and Quality 10 %
Monitoring and Alerting 10 %
Ensuring Data Security and Compliance 10 %
Debugging and Deploying 10 %
Data Ingestion & Acquisition 7 %
Data Governance 7 %
Data Modelling 6 %
Data Sharing and Federation 5 %
πŸ“Š Data Analyst Associate β€” October 2025 blueprint
Detail Value
Exam ID / page Databricks Certified Data Analyst Associate
Duration 90 minutes
Scored questions 45 multiple-choice
Languages English
Recommended experience 6+ months hands-on with Databricks SQL

Nine domains in the current exam guide, including the new AI/BI Genie Spaces section:

Domain Weight
Executing Queries Using Databricks SQL and Warehouses 20 %
Creating Dashboards and Visualizations 16 %
Analyzing Queries 15 %
Developing, Sharing, and Maintaining AI/BI Genie Spaces 12 %
Understanding Databricks Data Intelligence Platform 11 %
Managing Data 8 %
Securing Data 8 %
Importing Data 5 %
Data Modeling with Databricks SQL 5 %
πŸ“Š ML Associate β€” March 1, 2025 blueprint
Detail Value
Exam ID / page Databricks Certified Machine Learning Associate
Duration 90 minutes
Scored questions 48 multiple-choice
Languages English, Japanese, Portuguese (BR), Korean
Recommended experience 6+ months hands-on ML on Databricks
Domain Weight
Databricks Machine Learning 38 %
Model Development 31 %
ML Workflows 19 %
Model Deployment 12 %

Code is Python; non-ML supporting tasks may include SQL.

πŸ“Š ML Professional β€” September 2025 blueprint
Detail Value
Exam ID / page Databricks Certified Machine Learning Professional
Duration 120 minutes
Scored questions 59 multiple-choice
Languages English
Recommended experience 1+ years building enterprise-scale ML on Databricks
Domain Weight
Model Development 44 %
ML Ops 44 %
Model Deployment 12 %

The current exam guide consolidates the previous four domains into three; "Solution & Data Monitoring" responsibilities now sit inside ML Ops.

πŸ“Š GenAI Engineer Associate β€” March 2026 blueprint
Detail Value
Exam ID / page Databricks Certified Generative AI Engineer Associate
Duration 90 minutes
Scored questions 45 multiple-choice
Languages English, Japanese, Portuguese (BR), Korean
Recommended experience 6+ months hands-on building GenAI solutions on Databricks

Six domains in the current exam guide:

Domain Weight
Application Development 30 %
Assembling and Deploying Apps 22 %
Design Applications 14 %
Data Preparation 14 %
Evaluation and Monitoring 12 %
Governance 8 %

What changed in the 2025–2026 exam guides

Important

Several Databricks exam guides were updated in 2025–2026. Highlights to know:

  • Data Engineer Associate (May 2026) β€” terminology shifted from "Workflows" to Lakeflow Jobs; the guide also clarifies expectations around Delta ingestion patterns and CI/CD with Databricks Asset Bundles. exam guide PDF
  • Data Engineer Professional (Nov 30, 2025) β€” restructured to 10 domains with explicit weights; Data Sharing and Federation (Delta Sharing, Lakehouse Federation) and Data Modelling are now first-class domains.
  • Data Analyst Associate (Oct 2025) β€” added the AI/BI Genie Spaces domain (12 %); SQL warehouses + query analysis remain the highest-weight blocks.
  • ML Associate (Mar 1, 2025) β€” current blueprint emphasises Databricks ML (Unity Catalog, AutoML, Mosaic AI) at 38 % and dedicates 31 % to Model Development.
  • ML Professional (Sep 2025) β€” collapsed to three domains: Model Development, ML Ops, and Model Deployment (44 / 44 / 12).
  • GenAI Engineer Associate (Mar 2026) β€” refactored into six domains with Application Development (30 %) as the largest block; Governance is now an explicit (8 %) domain covering Unity Catalog for AI assets, PII handling, and content safety.

All passing-score references in older study materials should be ignored: Databricks no longer publishes a numeric passing threshold for any of these certifications. Treat the exam as pass/fail and aim to be comfortable across every domain, not just the high-weight ones.

Getting started with Obsidian (recommended)

This guide is written in Obsidian Flavored Markdown. It renders fine on GitHub, but in Obsidian you get callouts, foldable practice-question answers, Mermaid diagrams, backlinks, and a navigable Graph View of every cross-link β€” which makes studying meaningfully better.

5-minute onboarding

  1. Install Obsidian (free; macOS, Windows, Linux).

  2. Clone this repo somewhere on your machine:

    git clone https://github.com/kengio/databricks-certification-study-guide.git
    cd databricks-certification-study-guide
  3. Open the vault: launch Obsidian β†’ Open folder as vault β†’ pick the cloned databricks-certification-study-guide/ directory.

  4. Trust the author when Obsidian asks (the included .obsidian/ config has pre-tuned settings β€” line numbers, tab width, no inline titles).

  5. Open the certification you're targeting β€” e.g., certifications/data-engineer-associate/README.md. Press Cmd/Ctrl + O to fuzzy-find any topic.

  6. Toggle Graph View (Cmd/Ctrl + G) to see how topics cross-link β€” surprisingly useful for spotting weak areas.

Recommended plugins

Two are essential, the rest are quality-of-life. Install via Settings β†’ Community plugins β†’ Browse.

  • Obsidian Git β€” back up your notes and progress checkboxes to your own fork.
  • Linter β€” keeps your edits consistent with the project's markdown conventions.
  • Advanced Tables β€” auto-aligns Markdown tables as you type.
  • Codeblock Customizer (or Better CodeBlock) β€” line numbers, titles, copy buttons on code blocks.
  • Copilot (by logancyang) β€” chat with Claude / GPT-4o / Ollama inside Obsidian; Vault QA mode indexes the guide so the AI can quiz you using your actual notes.

πŸ“– See OBSIDIAN-SETUP.md for the full setup walkthrough β€” plugin configuration details, Copilot Vault QA setup, recommended study prompts, and tips for using AI to generate active-recall questions from your notes.

Don't want Obsidian?

No problem. The guide also renders perfectly in:

  • GitHub β€” browse the files online; callouts and Mermaid diagrams render natively
  • VS Code with the Markdown All in One extension
  • Any Markdown reader that supports GFM β€” you'll lose callouts and Graph View, but the content is fully readable

How to use this guide

  1. Pick a certification. Start from the six-certs table above, open that cert's README.md. Each one has a domain-weighted study plan and a progress tracker.
  2. Review shared fundamentals first β€” shared/fundamentals/ covers the cross-cert concepts (platform architecture, Delta Lake basics, Spark fundamentals, Unity Catalog basics, medallion architecture, MLflow basics, RAG/vector-search basics).
  3. Work through the cert's numbered topic folders in order. Each topic file is 300–600 lines with examples, comparison tables, common-mistake callouts, and exam tips.
  4. Use the cheat sheets for fast review after each domain.
  5. Hit the practice questions for your cert. Aim for 70 %+ per domain before moving on.
  6. Sit the mock exams under timed conditions when you think you're close.
  7. Reference the interview prep if you're using the cert as a stepping stone to a role change β€” those questions explore system-design depth beyond the exam.
  8. Run the hands-on lab pack to exercise the core features end-to-end in a Databricks workspace β€” labs cover medallion ingestion, UC governance, Lakeflow Declarative Pipelines, MLflow + Model Registry, and a Mosaic AI RAG demo.

Study roadmaps

Pick the plan that matches the time you have. The hour estimates assume an experienced practitioner brushing up on Databricks specifics; double them if you're newer to Spark or the lakehouse pattern. Each plan ends with the same outcome β€” sitting the exam with confidence.

Associate tier (DE Associate / DA Associate / ML Associate / GenAI Engineer Associate)

πŸƒ 4-week sprint (~25–30 hours total)

Week Focus Hours
1 Shared fundamentals + first half of the cert's topic folders 7
2 Second half of the cert's topic folders 7
3 All cheat sheets + per-domain practice questions (target 70 %+ each) 8
4 Mock Exam 1 (timed) β†’ review β†’ Mock Exam 2 β†’ interview-prep spot-checks β†’ exam 6

🚢 8-week balanced (~45–55 hours total)

Week Focus Hours
1 Read shared fundamentals + cert's first topic folder 6
2 Cert topic folders 2–3 6
3 Cert topic folders 4–end 6
4 Checkpoint: cheat sheets + practice questions for domains 1–2 6
5 Practice questions for remaining domains; revisit weak topics 7
6 Mock Exam 1 (timed) + per-question debrief 6
7 Gap-fill from Mock 1 debrief; second pass on weakest cheat sheet 5
8 Mock Exam 2 + final review β†’ exam 5

Professional tier (DE Professional / ML Professional)

🚢 8-week balanced (~70–80 hours total)

Week Focus Hours
1 Shared fundamentals + first 25 % of cert's topic folders 10
2 Topic folders 25–50 % 10
3 Topic folders 50–75 % 10
4 Topic folders 75–100 % + appendix entries for trickiest topics 10
5 All cheat sheets + Practice questions, domains 1–5 10
6 Practice questions, remaining domains; revisit cross-cutting concerns (security, perf, deployment) 10
7 Mock Exam 1 (timed) β†’ debrief β†’ topic-file deep-dive on every miss 10
8 Mock Exam 2 β†’ final review β†’ interview-prep spot-checks β†’ exam 10

🧘 12-week comprehensive (~100–120 hours total)

Best for: newcomers to the professional-tier scope, candidates targeting a role change, or anyone wanting deeper retention. Spread the 8-week plan above across 12 weeks with extra time on the highest-weight domains and add a hands-on lab week per quarter of the syllabus.

Suggested daily cadence

Weekday    (45–60 min):  Read 1 topic sub-file + run the examples in a Databricks workspace
Weekend    (90–120 min): Cheat sheet review + practice questions
Pre-exam   (last 3 days): Stop new material. Re-read cheat sheets only.
Exam day:                Read the cheat sheet stack once over coffee. Eat. Go pass it.

Certification paths

graph TD
    DAA[Data Analyst Associate] --> DEA[Data Engineer Associate]
    DEA --> DEP[Data Engineer Professional]
    DEA --> MLA[ML Associate]
    MLA --> MLP[ML Professional]
    MLA --> GAIA[GenAI Engineer Associate]
    DEP --> GAIA
Loading

See Learning Paths for role-specific progression guides (Data Engineer, Analytics Engineer, ML Engineer, AI Engineer).

Roadmap for the guide itself

This guide ships as a living resource. The roadmap below is what's planned for the next two quarters β€” issues and PRs against any of these are welcome.

Q3 2026 (next 3 months)

  • βœ… MIT license + open-source release β€” complete
  • βœ… Refresh all 6 cert READMEs to current exam-guide versions β€” complete
  • βœ… 2025–2026 update callouts in the top-level README and per-cert READMEs β€” complete
  • βœ… Topic-folder reorganisation for DE Professional to match the November 2025 10-domain structure β€” complete
  • βœ… Topic-folder reorganisation for Data Analyst Associate to surface AI/BI Genie Spaces as its own folder β€” complete
  • βœ… Topic-folder reorganisation for GenAI Engineer Associate to match the March 2026 6-domain structure β€” complete
  • βœ… Refreshed mock exams for the three certifications above β€” complete (terminology + new-domain questions for Data Sharing/Federation, AI/BI Genie Spaces, and GenAI Governance)
  • βœ… Hands-on lab pack β€” runnable PySpark / SQL / dbutils scripts that walk through medallion ingestion, Unity Catalog setup, Lakeflow Pipelines, MLflow tracking, and a Mosaic AI Vector Search RAG demo β€” complete (see labs/)

Q4 2026 (3–6 months out)

  • βœ… DE Associate topic-folder refresh for the May 2026 Lakeflow Jobs terminology β€” complete (added CI/CD and Monitoring folder, refreshed terminology)
  • βœ… ML Associate / ML Professional folder reorg to match the Mar 2025 (4-domain) and Sep 2025 (3-domain) blueprints β€” complete
  • βœ… Per-cert "final review" file (20-minute exam-morning scan) modelled on the dp-800 pattern β€” complete (6 files, one per cert)
  • βœ… Per-cert mock-exam debrief files mapping every question to a topic file and cheat sheet β€” complete (12 files, one per mock)
  • βœ… CONTRIBUTORS.md listing community contributors β€” complete (see CONTRIBUTORS.md)

Q1 2027 (6–12 months out)

  • βœ… Renewal guide for candidates whose 2-year validity expires in 2027–2028 β€” complete (see shared/appendix/renewal-guide.md)
  • βœ… Translation scaffolding β€” Thai in-tree, other languages via fork model β€” complete (see TRANSLATING.md + i18n/)
  • βœ… Spaced-repetition deck (Anki) β€” markdown-source decks + stdlib-only builder, two starter decks (Delta Lake 27 cards, Unity Catalog 22 cards) β€” complete (see anki/README.md)
  • βœ… Adaptive practice questions β€” static-only web quiz with localStorage progress tracking + adaptive selector. 18 banks / 931 questions (6 practice topic banks + 12 mock exams, all 6 cert tracks). Live at kengio.github.io/databricks-certification-study-guide β€” complete (see practice/README.md)

Legend: βœ… done Β· πŸ”„ in progress / next up Β· ⏳ planned Β· 🌱 ideas being explored

Repository layout

databricks-certification-study-guide/
β”œβ”€β”€ certifications/
β”‚   β”œβ”€β”€ data-engineer-associate/         # 6 topic folders + resources/ (May 2026 blueprint refresh)
β”‚   β”œβ”€β”€ data-engineer-professional/      # 10 topic folders + resources/ (Nov 2025 blueprint)
β”‚   β”œβ”€β”€ data-analyst-associate/          # 9 topic folders + resources/ (Oct 2025 blueprint)
β”‚   β”œβ”€β”€ ml-associate/                    # 4 topic folders + resources/ (Mar 2025 blueprint)
β”‚   β”œβ”€β”€ ml-professional/                 # 3 topic folders + resources/ (Sep 2025 blueprint)
β”‚   └── genai-engineer-associate/        # 6 topic folders + resources/ (Mar 2026 blueprint)
β”œβ”€β”€ shared/
β”‚   β”œβ”€β”€ fundamentals/                    # cross-cert concepts (platform, Delta, Spark, UC, Medallion, MLflow, RAG)
β”‚   β”œβ”€β”€ cheat-sheets/                    # quick-reference sheets (Delta, DLT, MLflow, PySpark, SQL, UC, performance)
β”‚   β”œβ”€β”€ appendix/                        # glossary, comparison tables, error reference
β”‚   β”œβ”€β”€ code-examples/                   # python/ and sql/ (always .md files)
β”‚   └── interview-prep/                  # 15 topic files, 108 open-ended design questions
β”œβ”€β”€ learning-paths/                      # role-based progression guides
β”œβ”€β”€ labs/                                # runnable hands-on lab pack (5 labs)
β”œβ”€β”€ i18n/                                # translations index + th/ (Thai in-tree translation)
β”œβ”€β”€ anki/                                # spaced-repetition decks (markdown source + stdlib builder)
β”œβ”€β”€ practice/                            # static adaptive practice quiz (HTML/CSS/JS + JSON banks)
β”œβ”€β”€ images/databricks-ui/                # screenshots organised by feature area
β”œβ”€β”€ CLAUDE.md                            # repo conventions for AI assistants
β”œβ”€β”€ OBSIDIAN-SETUP.md                    # full Obsidian onboarding walkthrough
β”œβ”€β”€ CONTRIBUTING.md                      # ground rules + 3-round review workflow
β”œβ”€β”€ TRANSLATING.md                       # translation policy + Thai contribution flow
β”œβ”€β”€ CHANGELOG.md                         # release history per exam-guide version
β”œβ”€β”€ LICENSE                              # MIT
└── README.md                            # this file

Official Databricks resources

πŸ“‹ Certifications and exam scheduling
πŸ“š Platform documentation
πŸ‘₯ Community and learning

Contributing

Found an error, a stale link, or a topic that needs deeper coverage? PRs are welcome.

  • Small fixes (typos, link rot, factual corrections) β€” open a PR directly
  • New practice questions or topic expansions β€” open an issue first to discuss scope
  • Blueprint changes β€” Databricks updates the exam guides periodically; PRs that bring sections in line with the latest exam-guide PDF for a certification are especially appreciated

Every PR goes through a 3-round self-review (technical, factual, style) before squash-merge. The PR template walks you through the checklist. Full details in CONTRIBUTING.md.

Contributors are credited in CONTRIBUTORS.md β€” your PR description can ask to be listed (or to be skipped if you'd prefer to stay anonymous).

Conventions live in CLAUDE.md.

License

Released under the MIT License. Use, fork, remix, redistribute β€” just keep the copyright notice.


This guide is a community resource. It is not affiliated with, endorsed by, or sponsored by Databricks.
"Databricks", "Delta Lake", "Lakeflow", "MLflow", "Unity Catalog", and "Mosaic AI" are trademarks of Databricks Inc.
Always verify against the current official Databricks exam guides β€” they are the source of truth.

Good luck on your exam. You've got this. ⭐ this repo if it helped you pass.

About

Open-source community study guide for all six Databricks certifications (Data Engineer Associate / Professional, Data Analyst Associate, ML Associate / Professional, GenAI Engineer Associate). Aligned to the 2025-2026 official exam guides. Obsidian-flavoured Markdown; PRs welcome.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors