Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
170 changes: 166 additions & 4 deletions .deepreview
Original file line number Diff line number Diff line change
Expand Up @@ -181,8 +181,8 @@ requirements_traceability:
# THIS TEST VALIDATES A HARD REQUIREMENT (JOBS-REQ-004.5).
# YOU MUST NOT MODIFY THIS TEST UNLESS THE REQUIREMENT CHANGES
```
Both lines are required. They must appear inside the method body
(after `def`, before the docstring). If tests have REQ references
Both lines are required. They must appear immediately before the
`def` line (not inside the method body). If tests have REQ references
only in docstrings but are missing the formal comment block, flag
them. If the second line is missing, flag that too.
5. For rule-validated requirements, verify the `.deepreview` rule's
Expand Down Expand Up @@ -238,6 +238,11 @@ update_documents_relating_to_src_deepwork:
Flag any sections that are now outdated or inaccurate due to the changes.
If the documentation file itself was changed, verify the updates are correct
and consistent with the source files.

Output Format:
- PASS: All documentation is current and accurate.
- FAIL: Issues found. List each with the doc file path, section name, and
what is outdated or inaccurate.
additional_context:
unchanged_matching_files: true

Expand All @@ -264,6 +269,11 @@ update_learning_agents_architecture:

Flag any sections that are now outdated or inaccurate due to the changes.
If the doc itself was changed, verify the updates are correct.

Output Format:
- PASS: All documentation is current and accurate.
- FAIL: Issues found. List each with the doc section name and what is
outdated or inaccurate.
additional_context:
unchanged_matching_files: true

Expand Down Expand Up @@ -300,8 +310,13 @@ shell_code_review:
section markers) still accurate after the changes? Flag any comments
that describe behavior that no longer matches the code.

Output Format:
- PASS: No issues found.
- FAIL: Issues found. List each with file, line, severity (high/medium/low),
and a concise description.

deepreview_config_quality:
description: "Review .deepreview configs for rule consolidation, overly broad rules, description accuracy, and correct directory placement."
description: "Review .deepreview configs for rule consolidation, description accuracy, overly broad rules, directory placement, valid configuration, and effective instructions."
match:
include:
- "**/.deepreview"
Expand Down Expand Up @@ -377,9 +392,156 @@ deepreview_config_quality:
- Rules that reference `additional_context.unchanged_matching_files`
with patterns spanning multiple directories.

## 5. Valid Configuration

Each rule must be syntactically valid YAML and structurally correct:
- Required fields present: `description`, `match.include`, `review.strategy`,
`review.instructions` (either inline string or `file:` reference).
- Field types correct: `match.include` and `match.exclude` are lists of
strings, `review.strategy` is one of `individual`, `matches_together`,
or `all_changed_files`, `description` is a string.
- No unknown top-level fields inside a rule (valid fields: `description`,
`match`, `review`).
- If `review.instructions.file` is used, the referenced file path exists.

## 6. Effective Instructions

Review instructions must give the reviewer enough context to do useful work:
- Instructions include concrete checks — not just "review this file" but
specific things to look for (e.g., "check that X matches Y", "verify Z
is present").
- If the rule protects a documentation file, the doc path is explicitly
referenced in the instructions so the reviewer knows what to read.
- If the rule uses `additional_context.unchanged_matching_files: true`,
the instructions explain why unchanged files are needed (e.g., "compare
the doc against source changes").
- Instructions specify the output format (PASS/FAIL with details).

## Output Format

- PASS: No issues found.
- FAIL: Issues found. List each with the .deepreview file path,
rule name, which check failed (consolidation / description / overly-broad / placement),
rule name, which check failed (consolidation / description / overly-broad /
placement / valid-configuration / effective-instructions),
and a specific recommendation.

job_yml_quality:
description: "Review job.yml files for structural quality: logical step decomposition, review coverage, no orphaned steps, and no promise tags."
match:
include:
- ".deepwork/jobs/**/job.yml"
- "src/deepwork/standard_jobs/**/job.yml"
- "library/jobs/**/job.yml"
exclude:
- "tests/fixtures/**"
review:
strategy: individual
instructions: |
Review this job.yml file for intrinsic structural quality.

Check all of the following:

## 1. Intermediate Deliverables

The job breaks work into logical steps with reviewable intermediate
deliverables. Each step should produce a tangible output that can be
reviewed before proceeding. Flag jobs where steps are too coarse
(doing too much in one step) or too fine (splitting trivially).

## 2. Review Coverage

Reviews are defined for each step that produces outputs. Particularly
critical documents should have their own dedicated reviews. Note that
reviewers do NOT have transcript access — if criteria need to evaluate
conversation behavior, the step must include an output file (e.g.,
`.deepwork/tmp/[summary].md`) as a communication channel to the reviewer.

## 3. No Orphaned Steps

Every step defined in the `steps` list must appear in at least one
workflow. Flag any step that is defined but not referenced by any
workflow's `steps` list.

## 4. No Promise Tags

The job.yml file must not contain `<promise>` tags anywhere in its
content (these are a deprecated pattern).

# Output Format
You should return one of the following:

- PASS: No issues found.
- FAIL: Issues found. List each with the check name
(deliverables / review-coverage / orphaned-steps / promise-tags)
and a specific description.

step_instruction_quality:
description: "Review step instruction files for completeness, specificity, output examples, quality criteria, structured questions, conciseness, and no promise tags."
match:
include:
- ".deepwork/jobs/**/steps/*.md"
- "src/deepwork/standard_jobs/**/steps/*.md"
- "library/jobs/**/steps/*.md"
exclude:
- "tests/fixtures/**"
review:
strategy: individual
instructions: |
Review this step instruction file for intrinsic quality.

IMPORTANT: Read the job.yml file in the same job directory for context on
how this instruction file fits into the larger workflow — particularly
which outputs the step declares and whether it has user inputs.

Check all of the following:

## 1. Complete

The instruction file is complete — no stubs, placeholders, TODO markers,
or sections saying "to be written". Every section has substantive content.

## 2. Specific & Actionable

Instructions are tailored to this step's specific purpose, not generic
boilerplate that could apply to any step. The agent should know exactly
what to do after reading the instructions.

## 3. Output Examples

If the step declares outputs (check the job.yml), the instruction file
shows what good output looks like. This can be template examples, worked
examples, or negative examples of what NOT to do. If the step has no
outputs, this check passes automatically.

## 4. Quality Criteria Defined

The instruction file defines quality criteria or acceptance criteria for
its outputs, either explicitly in a dedicated section or woven into the
instructions. The agent should understand what "done well" looks like.

## 5. Ask Structured Questions

If this step gathers user input (check the job.yml for `inputs` with
`name` fields rather than `file` references), the instructions explicitly
tell the agent to ask structured questions. If the step has no user
inputs, this check passes automatically.

## 6. Concise, No Redundancy

Instructions are free of unnecessary verbosity and do not duplicate
content that belongs in the job.yml's `common_job_info_provided_to_all_steps_at_runtime`
section. Shared context (project background, terminology, conventions)
should be in common_job_info, not repeated in each step file.

## 7. No Promise Lines

The instruction file must not contain `<promise>Quality Criteria Met</promise>`
or similar promise tag patterns (these are a deprecated pattern).

## Output Format

- PASS: No issues found.
- FAIL: Issues found. List each with the check name
(complete / specific / output-examples / quality-criteria /
structured-questions / conciseness / promise-lines)
and a specific description.
5 changes: 5 additions & 0 deletions .github/workflows/.deepreview
Original file line number Diff line number Diff line change
Expand Up @@ -18,5 +18,10 @@ update_github_workflows_readme:

Flag any sections that are now outdated or inaccurate due to the changes.
If the README itself was changed, verify the updates are correct.

Output Format:
- PASS: README accurately reflects all workflows.
- FAIL: Issues found. List each with the README section name and what is
outdated or inaccurate.
additional_context:
unchanged_matching_files: true
91 changes: 65 additions & 26 deletions doc/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -244,7 +244,10 @@ steps:
- name: product_category
description: "Product category"
outputs:
- competitors.md
competitors.md:
type: file
description: "List of competitors with descriptions"
required: true
dependencies: []

- id: primary_research
Expand All @@ -255,8 +258,10 @@ steps:
- file: competitors.md
from_step: identify_competitors
outputs:
- primary_research.md
- competitor_profiles/
primary_research.md:
type: file
description: "Primary research findings"
required: true
dependencies:
- identify_competitors

Expand Down Expand Up @@ -619,6 +624,7 @@ Users invoke workflows through the `/deepwork` skill, which uses MCP tools:
2. `start_workflow` — begins a workflow session, creates a git branch, returns first step instructions
3. `finished_step` — submits step outputs for quality review, returns next step or completion
4. `abort_workflow` — cancels the current workflow if it cannot be completed
5. `go_to_step` — navigates back to a prior step, clearing progress from that step onward

**Example: Creating a New Job**
```
Expand Down Expand Up @@ -932,7 +938,8 @@ DeepWork includes an MCP (Model Context Protocol) server that provides an altern
┌─────────────────────────────────────────────────────────────┐
│ DeepWork MCP Server │
│ Tools: get_workflows | start_workflow | finished_step │
│ Tools: get_workflows | start_workflow | finished_step | │
│ abort_workflow | go_to_step | review tools │
│ State: session tracking, step progress, outputs │
│ Quality Gate: invokes review agent for validation │
└─────────────────────────────────────────────────────────────┘
Expand Down Expand Up @@ -980,8 +987,10 @@ Begins a new workflow session.
Reports step completion and gets next instructions.

**Parameters**:
- `outputs: list[str]` - List of output file paths created
- `outputs: dict[str, str | list[str]]` - Map of output names to file path(s)
- `notes: str | None` - Optional notes about work done
- `quality_review_override_reason: str | None` - If provided, skips quality review
- `session_id: str | None` - Target a specific workflow session

**Returns**:
- `status: "needs_work" | "next_step" | "workflow_complete"`
Expand All @@ -998,7 +1007,23 @@ Aborts the current workflow and returns to the parent (if nested).

**Returns**: Aborted workflow info, resumed parent info (if any), current stack

#### 5. `get_review_instructions`
#### 5. `go_to_step`
Navigates back to a prior step, clearing progress from that step onward.

**Parameters**:
- `step_id: str` - ID of the step to navigate back to
- `session_id: str | None` - Target a specific workflow session

**Returns**: `begin_step` (step info for the target step), `invalidated_steps` (step IDs whose progress was cleared), `stack` (current workflow stack)

**Behavior**:
- Validates the target step exists in the workflow
- Rejects forward navigation (target entry index > current entry index)
- Clears session tracking state for all steps from target onward (files on disk are not deleted)
- For concurrent entries, navigates to the first step in the entry
- Marks the target step as started

#### 6. `get_review_instructions`
Runs the `.deepreview`-based code review pipeline. Registered directly in `jobs/mcp/server.py` (not in `tools.py`) since it operates outside the workflow lifecycle.

**Parameters**:
Expand All @@ -1008,15 +1033,15 @@ Runs the `.deepreview`-based code review pipeline. Registered directly in `jobs/

The `--platform` CLI option on `serve` controls which formatter is used (defaults to `"claude"`).

#### 6. `get_configured_reviews`
#### 7. `get_configured_reviews`
Lists configured review rules from `.deepreview` files without running the pipeline.

**Parameters**:
- `only_rules_matching_files: list[str] | None` - Filter to rules matching these files.

**Returns**: List of rule summaries (name, description, defining_file).

#### 7. `mark_review_as_passed`
#### 8. `mark_review_as_passed`
Marks a review as passed so it is skipped on subsequent runs while the reviewed files remain unchanged. Part of the **review pass caching** mechanism.

**Parameters**:
Expand All @@ -1038,12 +1063,17 @@ Manages workflow session state persisted to `.deepwork/tmp/session_[id].json`:

```python
class StateManager:
def create_session(...) -> WorkflowSession
def load_session(session_id) -> WorkflowSession
def start_step(step_id) -> None
def complete_step(step_id, outputs, notes) -> None
def advance_to_step(step_id, entry_index) -> None
def complete_workflow() -> None
async def create_session(...) -> WorkflowSession
def resolve_session(session_id=None) -> WorkflowSession
async def start_step(step_id, session_id=None) -> None
async def complete_step(step_id, outputs, notes, session_id=None) -> None
async def advance_to_step(step_id, entry_index, session_id=None) -> None
async def go_to_step(step_id, entry_index, invalidate_step_ids, session_id=None) -> None
async def complete_workflow(session_id=None) -> None
async def abort_workflow(explanation, session_id=None) -> tuple
async def record_quality_attempt(step_id, session_id=None) -> int
def get_all_outputs(session_id=None) -> dict
def get_stack() -> list[StackEntry]
```

Session state includes:
Expand All @@ -1058,25 +1088,34 @@ Evaluates step outputs against quality criteria:

```python
class QualityGate:
def evaluate(
quality_criteria: list[str],
outputs: list[str],
async def evaluate_reviews(
reviews: list[dict],
outputs: dict[str, str | list[str]],
output_specs: dict[str, str],
project_root: Path,
notes: str | None = None,
) -> list[ReviewResult]

async def build_review_instructions_file(
reviews: list[dict],
outputs: dict[str, str | list[str]],
output_specs: dict[str, str],
project_root: Path,
) -> QualityGateResult
notes: str | None = None,
) -> str
```

The quality gate:
1. Builds a review prompt with criteria and output file contents
2. Invokes Claude Code via subprocess with proper flag ordering (see `doc/reference/calling_claude_in_print_mode.md`)
3. Uses `--json-schema` for structured output conformance
4. Parses the `structured_output` field from the JSON response
5. Returns pass/fail with per-criterion feedback
The quality gate supports two modes:
- **External runner** (`evaluate_reviews`): Invokes Claude Code via subprocess to evaluate each review, returns list of failed `ReviewResult` objects
- **Self-review** (`build_review_instructions_file`): Generates a review instructions file for the agent to spawn a subagent for self-review

### Schemas (`jobs/mcp/schemas.py`)

Pydantic models for all tool inputs and outputs:
- `StartWorkflowInput`, `FinishedStepInput`
- `GetWorkflowsResponse`, `StartWorkflowResponse`, `FinishedStepResponse`
- `StartWorkflowInput`, `FinishedStepInput`, `AbortWorkflowInput`, `GoToStepInput`
- `GetWorkflowsResponse`, `StartWorkflowResponse`, `FinishedStepResponse`, `AbortWorkflowResponse`, `GoToStepResponse`
- `ActiveStepInfo`, `ExpectedOutput`, `ReviewInfo`, `ReviewResult`, `StackEntry`
- `JobInfo`, `WorkflowInfo`, `JobLoadErrorInfo`
- `WorkflowSession`, `StepProgress`
- `QualityGateResult`, `QualityCriteriaResult`

Expand Down
Loading