datajoint · dimitri-yatsenko · Jun 13, 2026 · Jun 14, 2026 · Jun 14, 2026 · Jun 18, 2026
diff --git a/src/explanation/relational-workflow-model.md b/src/explanation/relational-workflow-model.md
@@ -1,70 +1,38 @@
 # The Relational Workflow Model
 
-The relational model has historically admitted two interpretations. Codd's
-mathematical foundation (1970) views tables as logical predicates and rows
-as true propositions — rigorous but abstract. Chen's Entity-Relationship
-Model (1976) views tables as entity types or relationships — intuitive for
-domain modeling, but silent on how entities come into being. The
-**Relational Workflow Model** introduces a third interpretation: tables
-represent workflow steps, rows represent workflow artifacts, and foreign
-keys prescribe execution order. The schema specifies not only *what* data
-exists but *how* it is derived — a single formal system in which data
-structure, computational dependencies, and integrity constraints are all
-queryable, enforceable, and machine-readable.
-
-This unification is what makes DataJoint a *computational substrate* rather
-than a database in the conventional sense. Each surrounding category of
-tools is good at part of the problem and silent on the rest. File-based
-workflow systems (CWL, Snakemake, Nextflow) offer flexibility but fragment
-provenance across the filesystem and configuration. Task-centric
-orchestrators (Airflow, Argo, Prefect) manage execution but remain agnostic
-to data structure. Data catalogs (DataHub, Atlan, Marquez) describe data
-after it lands. Lakehouses (Delta, Iceberg, Hudi) optimize analytical
-queries but treat computation as external. The Relational Workflow Model
-is the deliberate trade-off: framework commitment in exchange for one
-formal system that addresses all four concerns at once.
-
-## Three interpretations of the relational model
-
-| Aspect | Mathematical (Codd) | Entity-Relationship (Chen) | **Relational Workflow (DataJoint)** |
-|--------|---------------------|----------------------------|-------------------------------------|
-| **Core question** | What functional dependencies exist? | What entity types exist? | **When and how are entities created?** |
-| **Table semantics** | Logical predicate | Entity or relationship | **Workflow step** |
-| **Row semantics** | True proposition | Entity instance | **Workflow artifact** |
-| **Foreign keys** | Referential integrity | Relationship | **Execution order** |
-| **Computation** | Not addressed | Not addressed | **Declared in schema** |
-| **Provenance** | Not addressed | Not addressed | **Structural** |
-| **Implementation gap** | High | High | **None** |
-
-## Four shifts from the classical relational model
-
-- **Tables represent workflow steps**, not merely categories of records.
-- **Rows represent workflow artifacts**, each with provenance to its inputs.
-- **Foreign keys prescribe execution order**, not only referential integrity — the dependency graph *is* the pipeline DAG, enforced by the database.
-- **Computed and Imported tables carry their own `make()` methods**, declaring derivation logic in the schema itself, not in an external workflow file.
-
-The schema is therefore *active*, not passive. A row exists in a Computed
-table if and only if its upstream key exists, its `make()` has run, and its
-result satisfies the declared constraints. The schema is the executable
-specification of the work.
+The **Relational Workflow Model** interprets tables as workflow steps,
+rows as workflow artifacts, and foreign keys as execution order. The
+schema specifies not only *what* data exists but *how* it is derived —
+a single formal system in which data structure, computational
+dependencies, and integrity constraints are all queryable, enforceable,
+and machine-readable. This unification is what makes DataJoint a
+*computational substrate* rather than a database in the conventional
+sense. The worked example below shows the model in action; its place in
+the lineage of relational modeling follows.
 
 ## A worked example
 
+Diagrams in this documentation use the same notation as `dj.Diagram` in
+`datajoint-python`: **Manual** tables are green rectangles, **Lookup**
+tables are plain text, **Imported** tables are blue ovals, and **Computed**
+tables are red ovals. Tier is conveyed by shape and color — the node
+itself carries only the table name.
+
 ```mermaid
 graph TD
-    Mouse["Mouse<br/><i>Manual</i>"]:::manual
-    Session["Session<br/><i>Manual</i>"]:::manual
-    Scan["Scan<br/><i>Manual</i>"]:::manual
-    SegParam["SegmentationParam<br/><i>Lookup</i>"]:::lookup
-    AvgFrame["AverageFrame<br/><i>Imported</i> &mdash; make()"]:::imported
-    Segmentation["Segmentation<br/><i>Computed</i> &mdash; make()"]:::computed
-    Fluorescence["Fluorescence<br/><i>Imported</i> &mdash; make()"]:::imported
+    Mouse["Mouse"]:::manual
+    Session["Session"]:::manual
+    Scan["Scan"]:::manual
+    SegParam["SegmentationParam"]:::lookup
+    AvgFrame(["AverageFrame"]):::imported
+    Segmentation(["Segmentation"]):::computed
+    Fluorescence(["Fluorescence"]):::imported
 
     Mouse --> Session --> Scan --> AvgFrame --> Segmentation --> Fluorescence
     SegParam --> Segmentation
 
     classDef manual    fill:#c8e6c9,stroke:#2e7d32,color:#1b5e20;
-    classDef lookup    fill:#e0e0e0,stroke:#616161,color:#212121;
+    classDef lookup    fill:none,stroke:none,color:#212121;
     classDef imported  fill:#bbdefb,stroke:#1565c0,color:#0d47a1;
     classDef computed  fill:#ffcdd2,stroke:#c62828,color:#b71c1c;
 ```
@@ -81,24 +49,52 @@ scheduler is consulted: the foreign-key graph dictates what may run, what
 must run first, and what already exists. The pipeline DAG and the database
 schema are the same object.
 
+## Three interpretations of the relational model
+
+The relational model has historically admitted two interpretations. Codd's
+mathematical foundation (1970) views tables as logical predicates and rows
+as true propositions — rigorous but abstract. Chen's Entity-Relationship
+Model (1976) views tables as entity types or relationships — intuitive
+for domain modeling, but silent on how entities come into being. The
+Relational Workflow Model adds a third, the one the worked example
+above illustrates.
+
+| Aspect | Mathematical (Codd) | Entity-Relationship (Chen) | **Relational Workflow (DataJoint)** |
+|--------|---------------------|----------------------------|-------------------------------------|
+| **Core question** | What functional dependencies exist? | What entity types exist? | **When and how are entities created?** |
+| **Table semantics** | Logical predicate | Entity or relationship | **Workflow step** |
+| **Row semantics** | True proposition | Entity instance | **Workflow artifact** |
+| **Foreign keys** | Referential integrity | Relationship | **Execution order** |
+| **Computation** | Not addressed | Not addressed | **Declared in schema** |
+| **Provenance** | Not addressed | Not addressed | **Structural** |
+| **Implementation gap** | High | High | **None** |
+
+## A semantic interpretation, not a departure
+
+The Relational Workflow Model layers a semantic interpretation on the
+classical relational model; it does not replace any of it. Tables, rows,
+primary and foreign keys, normalization, and the query algebra keep
+their classical meaning. The model adds four readings on top:
+
+- Tables also represent **workflow steps**.
+- Rows also represent **workflow artifacts**, carrying provenance to their inputs.
+- Foreign keys also prescribe **execution order** — the dependency graph *is* the pipeline DAG, enforced by the database.
+- **Computed and Imported tables carry their own `make()` methods**, declaring derivation logic in the schema itself rather than in an external workflow file.
+
+Under this interpretation the schema becomes *active*. A row exists in a
+Computed table if and only if its upstream key exists, its `make()` has
+run, and its result satisfies the declared constraints. The schema is the
+executable specification of the work.
+
 ## The deliberate trade-off
 
-Decoupled architectures have legitimate advantages. File-based workflow
-systems optimize for portability — any tool that reads files works.
-Orchestrators evolve independently of the data model. Lakehouses give
-analytics teams a layer that doesn't bind them to upstream pipeline
-choices. These are the right trade-offs for many use cases.
-
-DataJoint accepts tighter coupling deliberately. The cost is framework
-commitment. The benefit is one system that knows the data structure, the
-data, the computation that produced it, the dependencies between
-computations, and the integrity constraints that govern all of it.
-Everything an analyst, an engineer, or an AI agent might ask about the
-work — *what is this, where did it come from, what depends on it, what
-must hold for it to be valid, what would change if I touched the input* —
-is answerable by query against a single formal model. For scientific
-workflows where the data and the computation cannot be cleanly separated
-without losing the science, this is the right trade-off.
+DataJoint accepts tighter coupling deliberately, in exchange for one
+formal system that spans data structure, computation, dependencies, and
+integrity. See
+[Comparison to Workflow Languages](comparison-to-workflow-languages.md)
+for the structural treatment — what file-based workflows and task
+orchestrators each offer, what each omits, and when to use them
+alongside DataJoint.
 
 ## Substrate consequences
 
@@ -200,10 +196,14 @@ proper entity set with clear identity — distinguishes DataJoint's algebra
 from SQL, where query results lack both a well-defined primary key and a
 clear entity type.
 
-## From transactions to transformations
+## Two readings of the same schema
+
+The classical relational reading and the workflow reading hold
+simultaneously — they are interpretive lenses on the same schema, not
+incompatible designs.
 
-| Traditional view | Workflow view |
-|------------------|---------------|
+| Classical reading | Workflow reading |
+|-------------------|------------------|
 | Tables store data | Tables represent workflow steps |
 | Rows are records | Rows are workflow artifacts |
 | Foreign keys enforce consistency | Foreign keys prescribe execution order |