From 7d6e886bf1996a2ae3a5b2a87b4d5005fb76a9f3 Mon Sep 17 00:00:00 2001 From: Dimitri Yatsenko Date: Sat, 13 Jun 2026 13:06:57 -0500 Subject: [PATCH 1/4] WIP tracker: trim "deliberate trade-off" prose from RWM concept page Placeholder for follow-up work after #184 (expand RWM) and #185 (deeper concept pages) merge. Tracker file outlines what to trim, why, and how to pick the work up once both upstream PRs land. No content changes to docs source in this PR. The tracker file is to be deleted in the same commit that applies the trim. --- .../trim-rwm-deliberate-tradeoff.md | 56 +++++++++++++++++++ 1 file changed, 56 insertions(+) create mode 100644 .github/follow-ups/trim-rwm-deliberate-tradeoff.md diff --git a/.github/follow-ups/trim-rwm-deliberate-tradeoff.md b/.github/follow-ups/trim-rwm-deliberate-tradeoff.md new file mode 100644 index 00000000..230473d4 --- /dev/null +++ b/.github/follow-ups/trim-rwm-deliberate-tradeoff.md @@ -0,0 +1,56 @@ +# Follow-up: trim the "deliberate trade-off" prose from the RWM concept page + +**Status:** WIP, tracking only. Awaiting upstream merges. +**Blocked by:** [#184](https://github.com/datajoint/datajoint-docs/pull/184), [#185](https://github.com/datajoint/datajoint-docs/pull/185) + +## What + +Remove the "deliberate trade-off" prose from +`src/explanation/relational-workflow-model.md` and replace it with a short +sentence and a link to `comparison-to-workflow-languages.md`. + +## Why + +After [#184](https://github.com/datajoint/datajoint-docs/pull/184) (expanded +RWM intro) and [#185](https://github.com/datajoint/datajoint-docs/pull/185) +(new deeper pages) both land, the same argument lives in two places: + +- RWM page: "The deliberate trade-off" subsection (introduced by #184). +- Comparison page: a more developed treatment with concrete comparators + and the convertibility framing (introduced by #185). + +Keeping both creates drift risk and dilutes the overview page. The +comparison page is the natural home for the developed argument; the RWM +page should mention the trade-off briefly and link out. + +## How (rough outline) + +1. Open `src/explanation/relational-workflow-model.md` (post-#184 state). +2. Locate the section beginning with the heading that introduces the + "deliberate trade-off" framing (a few paragraphs near the top half of + the page after the opening + four-shifts). +3. Replace the subsection with a single tight paragraph along these lines: + - DataJoint accepts coupling deliberately, in exchange for one formal + system across data structure, computation, dependencies, and integrity. + - Link out: "see + [Comparison to Workflow Languages](comparison-to-workflow-languages.md) + for the structural treatment, what file-based workflows and task + orchestrators offer, what each omits, and when to use them together." +4. Verify the surrounding flow still reads cleanly (the "Substrate + consequences" section should follow naturally). +5. Run docs build locally (`mkdocs serve`) to check rendering. + +## Out of scope + +- No content moves in either direction other than the trim. +- No nav changes. +- No edits to other concept pages. + +## Pick-up checklist + +- [ ] #184 merged to `main` +- [ ] #185 merged to `main` +- [ ] Rebase this branch onto `main` +- [ ] Apply the trim per outline above +- [ ] Delete this tracker file as part of the same commit +- [ ] Move the PR out of draft and request review From cbb578f8ee0566a1b6f059e431a20d9b5f4c9651 Mon Sep 17 00:00:00 2001 From: Dimitri Yatsenko Date: Sun, 14 Jun 2026 11:01:03 -0500 Subject: [PATCH 2/4] docs(rwm): trim "deliberate trade-off" prose; link to Comparison page The developed argument lives on the Comparison to Workflow Languages page (added in #185). The RWM page now mentions the trade-off in one paragraph and links out, preventing drift between two homes for the same argument. Removes the .github/follow-ups/ tracker that scheduled this work. --- .../trim-rwm-deliberate-tradeoff.md | 56 ------------------- src/explanation/relational-workflow-model.md | 23 +++----- 2 files changed, 7 insertions(+), 72 deletions(-) delete mode 100644 .github/follow-ups/trim-rwm-deliberate-tradeoff.md diff --git a/.github/follow-ups/trim-rwm-deliberate-tradeoff.md b/.github/follow-ups/trim-rwm-deliberate-tradeoff.md deleted file mode 100644 index 230473d4..00000000 --- a/.github/follow-ups/trim-rwm-deliberate-tradeoff.md +++ /dev/null @@ -1,56 +0,0 @@ -# Follow-up: trim the "deliberate trade-off" prose from the RWM concept page - -**Status:** WIP, tracking only. Awaiting upstream merges. -**Blocked by:** [#184](https://github.com/datajoint/datajoint-docs/pull/184), [#185](https://github.com/datajoint/datajoint-docs/pull/185) - -## What - -Remove the "deliberate trade-off" prose from -`src/explanation/relational-workflow-model.md` and replace it with a short -sentence and a link to `comparison-to-workflow-languages.md`. - -## Why - -After [#184](https://github.com/datajoint/datajoint-docs/pull/184) (expanded -RWM intro) and [#185](https://github.com/datajoint/datajoint-docs/pull/185) -(new deeper pages) both land, the same argument lives in two places: - -- RWM page: "The deliberate trade-off" subsection (introduced by #184). -- Comparison page: a more developed treatment with concrete comparators - and the convertibility framing (introduced by #185). - -Keeping both creates drift risk and dilutes the overview page. The -comparison page is the natural home for the developed argument; the RWM -page should mention the trade-off briefly and link out. - -## How (rough outline) - -1. Open `src/explanation/relational-workflow-model.md` (post-#184 state). -2. Locate the section beginning with the heading that introduces the - "deliberate trade-off" framing (a few paragraphs near the top half of - the page after the opening + four-shifts). -3. Replace the subsection with a single tight paragraph along these lines: - - DataJoint accepts coupling deliberately, in exchange for one formal - system across data structure, computation, dependencies, and integrity. - - Link out: "see - [Comparison to Workflow Languages](comparison-to-workflow-languages.md) - for the structural treatment, what file-based workflows and task - orchestrators offer, what each omits, and when to use them together." -4. Verify the surrounding flow still reads cleanly (the "Substrate - consequences" section should follow naturally). -5. Run docs build locally (`mkdocs serve`) to check rendering. - -## Out of scope - -- No content moves in either direction other than the trim. -- No nav changes. -- No edits to other concept pages. - -## Pick-up checklist - -- [ ] #184 merged to `main` -- [ ] #185 merged to `main` -- [ ] Rebase this branch onto `main` -- [ ] Apply the trim per outline above -- [ ] Delete this tracker file as part of the same commit -- [ ] Move the PR out of draft and request review diff --git a/src/explanation/relational-workflow-model.md b/src/explanation/relational-workflow-model.md index a7b94da7..1c87c793 100644 --- a/src/explanation/relational-workflow-model.md +++ b/src/explanation/relational-workflow-model.md @@ -83,22 +83,13 @@ schema are the same object. ## The deliberate trade-off -Decoupled architectures have legitimate advantages. File-based workflow -systems optimize for portability — any tool that reads files works. -Orchestrators evolve independently of the data model. Lakehouses give -analytics teams a layer that doesn't bind them to upstream pipeline -choices. These are the right trade-offs for many use cases. - -DataJoint accepts tighter coupling deliberately. The cost is framework -commitment. The benefit is one system that knows the data structure, the -data, the computation that produced it, the dependencies between -computations, and the integrity constraints that govern all of it. -Everything an analyst, an engineer, or an AI agent might ask about the -work — *what is this, where did it come from, what depends on it, what -must hold for it to be valid, what would change if I touched the input* — -is answerable by query against a single formal model. For scientific -workflows where the data and the computation cannot be cleanly separated -without losing the science, this is the right trade-off. +DataJoint accepts tighter coupling deliberately, in exchange for one +formal system that spans data structure, computation, dependencies, and +integrity. See +[Comparison to Workflow Languages](comparison-to-workflow-languages.md) +for the structural treatment — what file-based workflows and task +orchestrators each offer, what each omits, and when to use them +alongside DataJoint. ## Substrate consequences From fc17564323ef0649104d76e2a520ca0719c93f50 Mon Sep 17 00:00:00 2001 From: Dimitri Yatsenko Date: Sun, 14 Jun 2026 11:04:44 -0500 Subject: [PATCH 3/4] docs(rwm): align worked-example diagram with dj.Diagram notation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Match the conventions from datajoint-python's dj.Diagram (diagram.py:1017-1082): - Manual: green rectangle (unchanged) - Lookup: plaintext — no border/fill (was a filled rectangle) - Imported: blue stadium-shaped node — closest Mermaid approximation to dj.Diagram's ellipse - Computed: red stadium-shaped node — same Drop the inline tier-name and make() annotations on each node; tier is now conveyed by shape and color alone, as in the real diagrams. A new lead paragraph spells out the convention so the reader can decode the diagram without a separate legend. --- src/explanation/relational-workflow-model.md | 22 +++++++++++++------- 1 file changed, 14 insertions(+), 8 deletions(-) diff --git a/src/explanation/relational-workflow-model.md b/src/explanation/relational-workflow-model.md index 1c87c793..07a30f02 100644 --- a/src/explanation/relational-workflow-model.md +++ b/src/explanation/relational-workflow-model.md @@ -50,21 +50,27 @@ specification of the work. ## A worked example +Diagrams in this documentation use the same notation as `dj.Diagram` in +`datajoint-python`: **Manual** tables are green rectangles, **Lookup** +tables are plain text, **Imported** tables are blue ovals, and **Computed** +tables are red ovals. Tier is conveyed by shape and color — the node +itself carries only the table name. + ```mermaid graph TD - Mouse["Mouse
Manual"]:::manual - Session["Session
Manual"]:::manual - Scan["Scan
Manual"]:::manual - SegParam["SegmentationParam
Lookup"]:::lookup - AvgFrame["AverageFrame
Imported — make()"]:::imported - Segmentation["Segmentation
Computed — make()"]:::computed - Fluorescence["Fluorescence
Imported — make()"]:::imported + Mouse["Mouse"]:::manual + Session["Session"]:::manual + Scan["Scan"]:::manual + SegParam["SegmentationParam"]:::lookup + AvgFrame(["AverageFrame"]):::imported + Segmentation(["Segmentation"]):::computed + Fluorescence(["Fluorescence"]):::imported Mouse --> Session --> Scan --> AvgFrame --> Segmentation --> Fluorescence SegParam --> Segmentation classDef manual fill:#c8e6c9,stroke:#2e7d32,color:#1b5e20; - classDef lookup fill:#e0e0e0,stroke:#616161,color:#212121; + classDef lookup fill:none,stroke:none,color:#212121; classDef imported fill:#bbdefb,stroke:#1565c0,color:#0d47a1; classDef computed fill:#ffcdd2,stroke:#c62828,color:#b71c1c; ``` From 001e91df1b51fa3e32d2bee67d6798928bb5a6e2 Mon Sep 17 00:00:00 2001 From: Dimitri Yatsenko Date: Thu, 18 Jun 2026 12:49:01 -0500 Subject: [PATCH 4/4] docs(rwm): restructure concrete-first; reframe as added interpretation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two structural cleanups on relational-workflow-model.md: Concrete-first ordering. Open with a tight paragraph naming the model, then lead with the worked example (diagram + walkthrough). The historical lineage (Codd/Chen/RWM three interpretations) now follows the example, placing DataJoint's contribution in context once the reader has a concrete pipeline to anchor on. The closing side-by-side reading table moves to the end of the page. Reframe as interpretation, not departure. Classical relational concepts (tables, rows, foreign keys, normalization, the query algebra) apply unchanged; RWM adds a semantic interpretation on top. Renamed and rewrote two sections to reflect this: - "Four shifts from the classical relational model" → "A semantic interpretation, not a departure" Bullets now read additively ("tables also represent workflow steps") rather than contrastively ("not merely categories"). - "From transactions to transformations" → "Two readings of the same schema" Lead-in clarifies both readings hold simultaneously. Column header changes from "Traditional view" to "Classical reading." --- src/explanation/relational-workflow-model.md | 103 ++++++++++--------- 1 file changed, 53 insertions(+), 50 deletions(-) diff --git a/src/explanation/relational-workflow-model.md b/src/explanation/relational-workflow-model.md index 07a30f02..d82501e7 100644 --- a/src/explanation/relational-workflow-model.md +++ b/src/explanation/relational-workflow-model.md @@ -1,52 +1,14 @@ # The Relational Workflow Model -The relational model has historically admitted two interpretations. Codd's -mathematical foundation (1970) views tables as logical predicates and rows -as true propositions — rigorous but abstract. Chen's Entity-Relationship -Model (1976) views tables as entity types or relationships — intuitive for -domain modeling, but silent on how entities come into being. The -**Relational Workflow Model** introduces a third interpretation: tables -represent workflow steps, rows represent workflow artifacts, and foreign -keys prescribe execution order. The schema specifies not only *what* data -exists but *how* it is derived — a single formal system in which data -structure, computational dependencies, and integrity constraints are all -queryable, enforceable, and machine-readable. - -This unification is what makes DataJoint a *computational substrate* rather -than a database in the conventional sense. Each surrounding category of -tools is good at part of the problem and silent on the rest. File-based -workflow systems (CWL, Snakemake, Nextflow) offer flexibility but fragment -provenance across the filesystem and configuration. Task-centric -orchestrators (Airflow, Argo, Prefect) manage execution but remain agnostic -to data structure. Data catalogs (DataHub, Atlan, Marquez) describe data -after it lands. Lakehouses (Delta, Iceberg, Hudi) optimize analytical -queries but treat computation as external. The Relational Workflow Model -is the deliberate trade-off: framework commitment in exchange for one -formal system that addresses all four concerns at once. - -## Three interpretations of the relational model - -| Aspect | Mathematical (Codd) | Entity-Relationship (Chen) | **Relational Workflow (DataJoint)** | -|--------|---------------------|----------------------------|-------------------------------------| -| **Core question** | What functional dependencies exist? | What entity types exist? | **When and how are entities created?** | -| **Table semantics** | Logical predicate | Entity or relationship | **Workflow step** | -| **Row semantics** | True proposition | Entity instance | **Workflow artifact** | -| **Foreign keys** | Referential integrity | Relationship | **Execution order** | -| **Computation** | Not addressed | Not addressed | **Declared in schema** | -| **Provenance** | Not addressed | Not addressed | **Structural** | -| **Implementation gap** | High | High | **None** | - -## Four shifts from the classical relational model - -- **Tables represent workflow steps**, not merely categories of records. -- **Rows represent workflow artifacts**, each with provenance to its inputs. -- **Foreign keys prescribe execution order**, not only referential integrity — the dependency graph *is* the pipeline DAG, enforced by the database. -- **Computed and Imported tables carry their own `make()` methods**, declaring derivation logic in the schema itself, not in an external workflow file. - -The schema is therefore *active*, not passive. A row exists in a Computed -table if and only if its upstream key exists, its `make()` has run, and its -result satisfies the declared constraints. The schema is the executable -specification of the work. +The **Relational Workflow Model** interprets tables as workflow steps, +rows as workflow artifacts, and foreign keys as execution order. The +schema specifies not only *what* data exists but *how* it is derived — +a single formal system in which data structure, computational +dependencies, and integrity constraints are all queryable, enforceable, +and machine-readable. This unification is what makes DataJoint a +*computational substrate* rather than a database in the conventional +sense. The worked example below shows the model in action; its place in +the lineage of relational modeling follows. ## A worked example @@ -87,6 +49,43 @@ scheduler is consulted: the foreign-key graph dictates what may run, what must run first, and what already exists. The pipeline DAG and the database schema are the same object. +## Three interpretations of the relational model + +The relational model has historically admitted two interpretations. Codd's +mathematical foundation (1970) views tables as logical predicates and rows +as true propositions — rigorous but abstract. Chen's Entity-Relationship +Model (1976) views tables as entity types or relationships — intuitive +for domain modeling, but silent on how entities come into being. The +Relational Workflow Model adds a third, the one the worked example +above illustrates. + +| Aspect | Mathematical (Codd) | Entity-Relationship (Chen) | **Relational Workflow (DataJoint)** | +|--------|---------------------|----------------------------|-------------------------------------| +| **Core question** | What functional dependencies exist? | What entity types exist? | **When and how are entities created?** | +| **Table semantics** | Logical predicate | Entity or relationship | **Workflow step** | +| **Row semantics** | True proposition | Entity instance | **Workflow artifact** | +| **Foreign keys** | Referential integrity | Relationship | **Execution order** | +| **Computation** | Not addressed | Not addressed | **Declared in schema** | +| **Provenance** | Not addressed | Not addressed | **Structural** | +| **Implementation gap** | High | High | **None** | + +## A semantic interpretation, not a departure + +The Relational Workflow Model layers a semantic interpretation on the +classical relational model; it does not replace any of it. Tables, rows, +primary and foreign keys, normalization, and the query algebra keep +their classical meaning. The model adds four readings on top: + +- Tables also represent **workflow steps**. +- Rows also represent **workflow artifacts**, carrying provenance to their inputs. +- Foreign keys also prescribe **execution order** — the dependency graph *is* the pipeline DAG, enforced by the database. +- **Computed and Imported tables carry their own `make()` methods**, declaring derivation logic in the schema itself rather than in an external workflow file. + +Under this interpretation the schema becomes *active*. A row exists in a +Computed table if and only if its upstream key exists, its `make()` has +run, and its result satisfies the declared constraints. The schema is the +executable specification of the work. + ## The deliberate trade-off DataJoint accepts tighter coupling deliberately, in exchange for one @@ -197,10 +196,14 @@ proper entity set with clear identity — distinguishes DataJoint's algebra from SQL, where query results lack both a well-defined primary key and a clear entity type. -## From transactions to transformations +## Two readings of the same schema + +The classical relational reading and the workflow reading hold +simultaneously — they are interpretive lenses on the same schema, not +incompatible designs. -| Traditional view | Workflow view | -|------------------|---------------| +| Classical reading | Workflow reading | +|-------------------|------------------| | Tables store data | Tables represent workflow steps | | Rows are records | Rows are workflow artifacts | | Foreign keys enforce consistency | Foreign keys prescribe execution order |