diff --git a/.deepreview b/.deepreview index 058eef15..9435ba52 100644 --- a/.deepreview +++ b/.deepreview @@ -181,8 +181,8 @@ requirements_traceability: # THIS TEST VALIDATES A HARD REQUIREMENT (JOBS-REQ-004.5). # YOU MUST NOT MODIFY THIS TEST UNLESS THE REQUIREMENT CHANGES ``` - Both lines are required. They must appear inside the method body - (after `def`, before the docstring). If tests have REQ references + Both lines are required. They must appear immediately before the + `def` line (not inside the method body). If tests have REQ references only in docstrings but are missing the formal comment block, flag them. If the second line is missing, flag that too. 5. For rule-validated requirements, verify the `.deepreview` rule's @@ -238,6 +238,11 @@ update_documents_relating_to_src_deepwork: Flag any sections that are now outdated or inaccurate due to the changes. If the documentation file itself was changed, verify the updates are correct and consistent with the source files. + + Output Format: + - PASS: All documentation is current and accurate. + - FAIL: Issues found. List each with the doc file path, section name, and + what is outdated or inaccurate. additional_context: unchanged_matching_files: true @@ -264,6 +269,11 @@ update_learning_agents_architecture: Flag any sections that are now outdated or inaccurate due to the changes. If the doc itself was changed, verify the updates are correct. + + Output Format: + - PASS: All documentation is current and accurate. + - FAIL: Issues found. List each with the doc section name and what is + outdated or inaccurate. additional_context: unchanged_matching_files: true @@ -300,8 +310,13 @@ shell_code_review: section markers) still accurate after the changes? Flag any comments that describe behavior that no longer matches the code. + Output Format: + - PASS: No issues found. + - FAIL: Issues found. List each with file, line, severity (high/medium/low), + and a concise description. + deepreview_config_quality: - description: "Review .deepreview configs for rule consolidation, overly broad rules, description accuracy, and correct directory placement." + description: "Review .deepreview configs for rule consolidation, description accuracy, overly broad rules, directory placement, valid configuration, and effective instructions." match: include: - "**/.deepreview" @@ -377,9 +392,156 @@ deepreview_config_quality: - Rules that reference `additional_context.unchanged_matching_files` with patterns spanning multiple directories. + ## 5. Valid Configuration + + Each rule must be syntactically valid YAML and structurally correct: + - Required fields present: `description`, `match.include`, `review.strategy`, + `review.instructions` (either inline string or `file:` reference). + - Field types correct: `match.include` and `match.exclude` are lists of + strings, `review.strategy` is one of `individual`, `matches_together`, + or `all_changed_files`, `description` is a string. + - No unknown top-level fields inside a rule (valid fields: `description`, + `match`, `review`). + - If `review.instructions.file` is used, the referenced file path exists. + + ## 6. Effective Instructions + + Review instructions must give the reviewer enough context to do useful work: + - Instructions include concrete checks — not just "review this file" but + specific things to look for (e.g., "check that X matches Y", "verify Z + is present"). + - If the rule protects a documentation file, the doc path is explicitly + referenced in the instructions so the reviewer knows what to read. + - If the rule uses `additional_context.unchanged_matching_files: true`, + the instructions explain why unchanged files are needed (e.g., "compare + the doc against source changes"). + - Instructions specify the output format (PASS/FAIL with details). + ## Output Format - PASS: No issues found. - FAIL: Issues found. List each with the .deepreview file path, - rule name, which check failed (consolidation / description / overly-broad / placement), + rule name, which check failed (consolidation / description / overly-broad / + placement / valid-configuration / effective-instructions), and a specific recommendation. + +job_yml_quality: + description: "Review job.yml files for structural quality: logical step decomposition, review coverage, no orphaned steps, and no promise tags." + match: + include: + - ".deepwork/jobs/**/job.yml" + - "src/deepwork/standard_jobs/**/job.yml" + - "library/jobs/**/job.yml" + exclude: + - "tests/fixtures/**" + review: + strategy: individual + instructions: | + Review this job.yml file for intrinsic structural quality. + + Check all of the following: + + ## 1. Intermediate Deliverables + + The job breaks work into logical steps with reviewable intermediate + deliverables. Each step should produce a tangible output that can be + reviewed before proceeding. Flag jobs where steps are too coarse + (doing too much in one step) or too fine (splitting trivially). + + ## 2. Review Coverage + + Reviews are defined for each step that produces outputs. Particularly + critical documents should have their own dedicated reviews. Note that + reviewers do NOT have transcript access — if criteria need to evaluate + conversation behavior, the step must include an output file (e.g., + `.deepwork/tmp/[summary].md`) as a communication channel to the reviewer. + + ## 3. No Orphaned Steps + + Every step defined in the `steps` list must appear in at least one + workflow. Flag any step that is defined but not referenced by any + workflow's `steps` list. + + ## 4. No Promise Tags + + The job.yml file must not contain `` tags anywhere in its + content (these are a deprecated pattern). + + # Output Format + You should return one of the following: + + - PASS: No issues found. + - FAIL: Issues found. List each with the check name + (deliverables / review-coverage / orphaned-steps / promise-tags) + and a specific description. + +step_instruction_quality: + description: "Review step instruction files for completeness, specificity, output examples, quality criteria, structured questions, conciseness, and no promise tags." + match: + include: + - ".deepwork/jobs/**/steps/*.md" + - "src/deepwork/standard_jobs/**/steps/*.md" + - "library/jobs/**/steps/*.md" + exclude: + - "tests/fixtures/**" + review: + strategy: individual + instructions: | + Review this step instruction file for intrinsic quality. + + IMPORTANT: Read the job.yml file in the same job directory for context on + how this instruction file fits into the larger workflow — particularly + which outputs the step declares and whether it has user inputs. + + Check all of the following: + + ## 1. Complete + + The instruction file is complete — no stubs, placeholders, TODO markers, + or sections saying "to be written". Every section has substantive content. + + ## 2. Specific & Actionable + + Instructions are tailored to this step's specific purpose, not generic + boilerplate that could apply to any step. The agent should know exactly + what to do after reading the instructions. + + ## 3. Output Examples + + If the step declares outputs (check the job.yml), the instruction file + shows what good output looks like. This can be template examples, worked + examples, or negative examples of what NOT to do. If the step has no + outputs, this check passes automatically. + + ## 4. Quality Criteria Defined + + The instruction file defines quality criteria or acceptance criteria for + its outputs, either explicitly in a dedicated section or woven into the + instructions. The agent should understand what "done well" looks like. + + ## 5. Ask Structured Questions + + If this step gathers user input (check the job.yml for `inputs` with + `name` fields rather than `file` references), the instructions explicitly + tell the agent to ask structured questions. If the step has no user + inputs, this check passes automatically. + + ## 6. Concise, No Redundancy + + Instructions are free of unnecessary verbosity and do not duplicate + content that belongs in the job.yml's `common_job_info_provided_to_all_steps_at_runtime` + section. Shared context (project background, terminology, conventions) + should be in common_job_info, not repeated in each step file. + + ## 7. No Promise Lines + + The instruction file must not contain `Quality Criteria Met` + or similar promise tag patterns (these are a deprecated pattern). + + ## Output Format + + - PASS: No issues found. + - FAIL: Issues found. List each with the check name + (complete / specific / output-examples / quality-criteria / + structured-questions / conciseness / promise-lines) + and a specific description. diff --git a/.github/workflows/.deepreview b/.github/workflows/.deepreview index 24848c71..cc727663 100644 --- a/.github/workflows/.deepreview +++ b/.github/workflows/.deepreview @@ -18,5 +18,10 @@ update_github_workflows_readme: Flag any sections that are now outdated or inaccurate due to the changes. If the README itself was changed, verify the updates are correct. + + Output Format: + - PASS: README accurately reflects all workflows. + - FAIL: Issues found. List each with the README section name and what is + outdated or inaccurate. additional_context: unchanged_matching_files: true diff --git a/doc/architecture.md b/doc/architecture.md index 927df6d3..0a20b4b9 100644 --- a/doc/architecture.md +++ b/doc/architecture.md @@ -244,7 +244,10 @@ steps: - name: product_category description: "Product category" outputs: - - competitors.md + competitors.md: + type: file + description: "List of competitors with descriptions" + required: true dependencies: [] - id: primary_research @@ -255,8 +258,10 @@ steps: - file: competitors.md from_step: identify_competitors outputs: - - primary_research.md - - competitor_profiles/ + primary_research.md: + type: file + description: "Primary research findings" + required: true dependencies: - identify_competitors @@ -619,6 +624,7 @@ Users invoke workflows through the `/deepwork` skill, which uses MCP tools: 2. `start_workflow` — begins a workflow session, creates a git branch, returns first step instructions 3. `finished_step` — submits step outputs for quality review, returns next step or completion 4. `abort_workflow` — cancels the current workflow if it cannot be completed +5. `go_to_step` — navigates back to a prior step, clearing progress from that step onward **Example: Creating a New Job** ``` @@ -932,7 +938,8 @@ DeepWork includes an MCP (Model Context Protocol) server that provides an altern ▼ ┌─────────────────────────────────────────────────────────────┐ │ DeepWork MCP Server │ -│ Tools: get_workflows | start_workflow | finished_step │ +│ Tools: get_workflows | start_workflow | finished_step | │ +│ abort_workflow | go_to_step | review tools │ │ State: session tracking, step progress, outputs │ │ Quality Gate: invokes review agent for validation │ └─────────────────────────────────────────────────────────────┘ @@ -980,8 +987,10 @@ Begins a new workflow session. Reports step completion and gets next instructions. **Parameters**: -- `outputs: list[str]` - List of output file paths created +- `outputs: dict[str, str | list[str]]` - Map of output names to file path(s) - `notes: str | None` - Optional notes about work done +- `quality_review_override_reason: str | None` - If provided, skips quality review +- `session_id: str | None` - Target a specific workflow session **Returns**: - `status: "needs_work" | "next_step" | "workflow_complete"` @@ -998,7 +1007,23 @@ Aborts the current workflow and returns to the parent (if nested). **Returns**: Aborted workflow info, resumed parent info (if any), current stack -#### 5. `get_review_instructions` +#### 5. `go_to_step` +Navigates back to a prior step, clearing progress from that step onward. + +**Parameters**: +- `step_id: str` - ID of the step to navigate back to +- `session_id: str | None` - Target a specific workflow session + +**Returns**: `begin_step` (step info for the target step), `invalidated_steps` (step IDs whose progress was cleared), `stack` (current workflow stack) + +**Behavior**: +- Validates the target step exists in the workflow +- Rejects forward navigation (target entry index > current entry index) +- Clears session tracking state for all steps from target onward (files on disk are not deleted) +- For concurrent entries, navigates to the first step in the entry +- Marks the target step as started + +#### 6. `get_review_instructions` Runs the `.deepreview`-based code review pipeline. Registered directly in `jobs/mcp/server.py` (not in `tools.py`) since it operates outside the workflow lifecycle. **Parameters**: @@ -1008,7 +1033,7 @@ Runs the `.deepreview`-based code review pipeline. Registered directly in `jobs/ The `--platform` CLI option on `serve` controls which formatter is used (defaults to `"claude"`). -#### 6. `get_configured_reviews` +#### 7. `get_configured_reviews` Lists configured review rules from `.deepreview` files without running the pipeline. **Parameters**: @@ -1016,7 +1041,7 @@ Lists configured review rules from `.deepreview` files without running the pipel **Returns**: List of rule summaries (name, description, defining_file). -#### 7. `mark_review_as_passed` +#### 8. `mark_review_as_passed` Marks a review as passed so it is skipped on subsequent runs while the reviewed files remain unchanged. Part of the **review pass caching** mechanism. **Parameters**: @@ -1038,12 +1063,17 @@ Manages workflow session state persisted to `.deepwork/tmp/session_[id].json`: ```python class StateManager: - def create_session(...) -> WorkflowSession - def load_session(session_id) -> WorkflowSession - def start_step(step_id) -> None - def complete_step(step_id, outputs, notes) -> None - def advance_to_step(step_id, entry_index) -> None - def complete_workflow() -> None + async def create_session(...) -> WorkflowSession + def resolve_session(session_id=None) -> WorkflowSession + async def start_step(step_id, session_id=None) -> None + async def complete_step(step_id, outputs, notes, session_id=None) -> None + async def advance_to_step(step_id, entry_index, session_id=None) -> None + async def go_to_step(step_id, entry_index, invalidate_step_ids, session_id=None) -> None + async def complete_workflow(session_id=None) -> None + async def abort_workflow(explanation, session_id=None) -> tuple + async def record_quality_attempt(step_id, session_id=None) -> int + def get_all_outputs(session_id=None) -> dict + def get_stack() -> list[StackEntry] ``` Session state includes: @@ -1058,25 +1088,34 @@ Evaluates step outputs against quality criteria: ```python class QualityGate: - def evaluate( - quality_criteria: list[str], - outputs: list[str], + async def evaluate_reviews( + reviews: list[dict], + outputs: dict[str, str | list[str]], + output_specs: dict[str, str], + project_root: Path, + notes: str | None = None, + ) -> list[ReviewResult] + + async def build_review_instructions_file( + reviews: list[dict], + outputs: dict[str, str | list[str]], + output_specs: dict[str, str], project_root: Path, - ) -> QualityGateResult + notes: str | None = None, + ) -> str ``` -The quality gate: -1. Builds a review prompt with criteria and output file contents -2. Invokes Claude Code via subprocess with proper flag ordering (see `doc/reference/calling_claude_in_print_mode.md`) -3. Uses `--json-schema` for structured output conformance -4. Parses the `structured_output` field from the JSON response -5. Returns pass/fail with per-criterion feedback +The quality gate supports two modes: +- **External runner** (`evaluate_reviews`): Invokes Claude Code via subprocess to evaluate each review, returns list of failed `ReviewResult` objects +- **Self-review** (`build_review_instructions_file`): Generates a review instructions file for the agent to spawn a subagent for self-review ### Schemas (`jobs/mcp/schemas.py`) Pydantic models for all tool inputs and outputs: -- `StartWorkflowInput`, `FinishedStepInput` -- `GetWorkflowsResponse`, `StartWorkflowResponse`, `FinishedStepResponse` +- `StartWorkflowInput`, `FinishedStepInput`, `AbortWorkflowInput`, `GoToStepInput` +- `GetWorkflowsResponse`, `StartWorkflowResponse`, `FinishedStepResponse`, `AbortWorkflowResponse`, `GoToStepResponse` +- `ActiveStepInfo`, `ExpectedOutput`, `ReviewInfo`, `ReviewResult`, `StackEntry` +- `JobInfo`, `WorkflowInfo`, `JobLoadErrorInfo` - `WorkflowSession`, `StepProgress` - `QualityGateResult`, `QualityCriteriaResult` diff --git a/doc/mcp_interface.md b/doc/mcp_interface.md index b234d261..5e1623c7 100644 --- a/doc/mcp_interface.md +++ b/doc/mcp_interface.md @@ -10,7 +10,7 @@ This document describes the Model Context Protocol (MCP) tools exposed by the De ## Tools -DeepWork exposes seven MCP tools: +DeepWork exposes eight MCP tools: ### 1. `get_workflows` @@ -24,7 +24,8 @@ None. ```typescript { - jobs: JobInfo[] + jobs: JobInfo[]; + errors: JobLoadErrorInfo[]; // Jobs that failed to parse } ``` @@ -34,10 +35,15 @@ Where `JobInfo` is: interface JobInfo { name: string; // Job identifier summary: string; // Short summary of the job - description: string | null; // Full description (optional) workflows: WorkflowInfo[]; // Named workflows in the job } +interface JobLoadErrorInfo { + job_name: string; // Name of the job that failed + job_dir: string; // Path to the job directory + error: string; // Error message +} + interface WorkflowInfo { name: string; // Workflow identifier summary: string; // Short description @@ -136,7 +142,37 @@ Abort the current workflow and return to the parent workflow (if nested). Use th --- -### 5. `get_review_instructions` +### 5. `go_to_step` + +Navigate back to a prior step in the current workflow. Clears all progress from the target step onward, forcing re-execution of subsequent steps to ensure consistency. Use this when earlier outputs need revision or quality issues are discovered in later steps. + +#### Parameters + +| Parameter | Type | Required | Description | +|-----------|------|----------|-------------| +| `step_id` | `string` | Yes | ID of the step to navigate back to. Must exist in the current workflow. | +| `session_id` | `string \| null` | No | Target a specific workflow session by ID. Use when multiple workflows are active concurrently. If omitted, operates on the top-of-stack session. | + +#### Returns + +```typescript +{ + begin_step: ActiveStepInfo; // Information about the step to begin working on + invalidated_steps: string[]; // Step IDs whose progress was cleared (from target onward) + stack: StackEntry[]; // Current workflow stack after navigation +} +``` + +#### Behavior + +- **Backward/current only**: The target step's entry index must be <= the current entry index. To go forward, use `finished_step`. +- **Clears subsequent progress**: All `step_progress` entries from the target step onward are deleted (outputs, timestamps, quality attempts). The agent must re-execute all affected steps. +- **Files preserved**: Only session tracking state is cleared. Files on disk are not deleted — Git handles file versioning. +- **Concurrent entries**: When targeting a step in a concurrent entry, navigation goes to the first step in that entry, and all steps in the group are invalidated. + +--- + +### 6. `get_review_instructions` Run a review of changed files based on `.deepreview` configuration files. Returns a list of review tasks to invoke in parallel. Each task has `name`, `description`, `subagent_type`, and `prompt` fields for the Task tool. @@ -156,7 +192,7 @@ A plain string with one of: --- -### 6. `get_configured_reviews` +### 7. `get_configured_reviews` List all configured review rules from `.deepreview` files. Returns each rule's name, description, and defining file location. Optionally filters to rules matching specific files. @@ -180,7 +216,7 @@ Array<{ --- -### 7. `mark_review_as_passed` +### 8. `mark_review_as_passed` Mark a review as passed so it won't be re-run while reviewed files remain unchanged. The `review_id` is provided in the instruction file's "After Review" section. @@ -220,11 +256,13 @@ interface ActiveStepInfo { step_expected_outputs: ExpectedOutput[]; // Expected outputs with type and format hints step_reviews: ReviewInfo[]; // Reviews to run when step completes step_instructions: string; // Instructions for the step + common_job_info: string; // Common context shared across all steps in this job } interface ReviewInfo { run_each: string; // 'step' or output name to review quality_criteria: Record; // Map of criterion name to question + additional_review_guidance: string | null; // Optional guidance for the reviewer } interface ReviewResult { @@ -345,22 +383,6 @@ Workflows can be nested — starting a new workflow while one is active pushes o --- -## Configuration - -The MCP server is configured via `.deepwork/config.yml`: - -```yaml -version: "1.0" -platforms: - - claude -``` - -Quality gate is enabled by default and uses Claude Code to evaluate step outputs -against quality criteria. See `doc/reference/calling_claude_in_print_mode.md` for -details on how Claude CLI is invoked. - ---- - ## Server CLI Options ```bash @@ -398,6 +420,7 @@ Add to your `.mcp.json`: | Version | Changes | |---------|---------| +| 1.9.0 | Added `go_to_step` tool for navigating back to prior steps. Clears all step progress from the target step onward, forcing re-execution of subsequent steps. Supports `session_id` for concurrent workflow safety. | | 1.8.0 | Added `how_to_invoke` field to `WorkflowInfo` in `get_workflows` response. Always populated with invocation instructions: when a workflow's `agent` field is set, directs callers to delegate via the Task tool; otherwise, directs callers to use the `start_workflow` MCP tool directly. Also added optional `agent` field to workflow definitions in job.yml. | | 1.7.0 | Added `mark_review_as_passed` tool for review pass caching. Instruction files now include an "After Review" section with the review ID. Reviews with a `.passed` marker are automatically skipped by `get_review_instructions`. | | 1.6.0 | Added `get_configured_reviews` tool for listing configured review rules without running the full pipeline. Supports optional file-based filtering. | diff --git a/flake.lock b/flake.lock index 9dce1421..96df5e81 100644 --- a/flake.lock +++ b/flake.lock @@ -69,59 +69,10 @@ "type": "github" } }, - "pyproject-build-systems": { - "inputs": { - "nixpkgs": [ - "nixpkgs" - ], - "pyproject-nix": [ - "pyproject-nix" - ], - "uv2nix": [ - "uv2nix" - ] - }, - "locked": { - "lastModified": 1763662255, - "narHash": "sha256-4bocaOyLa3AfiS8KrWjZQYu+IAta05u3gYZzZ6zXbT0=", - "owner": "pyproject-nix", - "repo": "build-system-pkgs", - "rev": "042904167604c681a090c07eb6967b4dd4dae88c", - "type": "github" - }, - "original": { - "owner": "pyproject-nix", - "repo": "build-system-pkgs", - "type": "github" - } - }, - "pyproject-nix": { - "inputs": { - "nixpkgs": [ - "nixpkgs" - ] - }, - "locked": { - "lastModified": 1769936401, - "narHash": "sha256-kwCOegKLZJM9v/e/7cqwg1p/YjjTAukKPqmxKnAZRgA=", - "owner": "pyproject-nix", - "repo": "pyproject.nix", - "rev": "b0d513eeeebed6d45b4f2e874f9afba2021f7812", - "type": "github" - }, - "original": { - "owner": "pyproject-nix", - "repo": "pyproject.nix", - "type": "github" - } - }, "root": { "inputs": { "claude-code-nix": "claude-code-nix", - "nixpkgs": "nixpkgs_2", - "pyproject-build-systems": "pyproject-build-systems", - "pyproject-nix": "pyproject-nix", - "uv2nix": "uv2nix" + "nixpkgs": "nixpkgs_2" } }, "systems": { @@ -138,29 +89,6 @@ "repo": "default", "type": "github" } - }, - "uv2nix": { - "inputs": { - "nixpkgs": [ - "nixpkgs" - ], - "pyproject-nix": [ - "pyproject-nix" - ] - }, - "locked": { - "lastModified": 1770605395, - "narHash": "sha256-jXydlwsCseO9F0YKPBMpPTWTS4JM7QDVIXaHvjfI1hg=", - "owner": "pyproject-nix", - "repo": "uv2nix", - "rev": "661dadc1e3ff53142e1554172ab60c667de2c1d5", - "type": "github" - }, - "original": { - "owner": "pyproject-nix", - "repo": "uv2nix", - "type": "github" - } } }, "root": "root", diff --git a/plugins/claude/skills/review/SKILL.md b/plugins/claude/skills/review/SKILL.md index 02318e24..3b60e982 100644 --- a/plugins/claude/skills/review/SKILL.md +++ b/plugins/claude/skills/review/SKILL.md @@ -19,7 +19,10 @@ Only proceed past this section if the user wants to **run** reviews. ## How to Run 1. First, call `mcp__deepwork__get_configured_reviews` to see what review rules are configured. This returns each rule's name, description, and which `.deepreview` file defines it. If reviewing specific files, pass `only_rules_matching_files` to see only the rules that apply. Share a brief summary of the active rules with the user before proceeding. - - **If no rules are configured**: Use AskUserQuestion to tell the user there are no `.deepreview` rules set up yet, and ask if they'd like the agent to auto-discover and suggest rules for this project. If they say yes, invoke the `/deepwork` skill with the `deepwork_reviews` job's `discover_rules` workflow (which sets up native reviews, skill migration, documentation rules, and language-specific code review). Stop here — do not proceed with running reviews if there are no rules. + - **If no rules are configured**: + 1. Use AskUserQuestion to tell the user there are no `.deepreview` rules set up yet and ask if they'd like the agent to auto-discover and suggest rules for this project. + 2. If yes, invoke the `/deepwork` skill with the `deepwork_reviews` job's `discover_rules` workflow (which sets up native reviews, skill migration, documentation rules, and language-specific code review). + 3. Stop here — do not proceed with running reviews if there are no rules. 2. Call the `mcp__deepwork__get_review_instructions` tool: - **No arguments** to review the current branch's changes (auto-detects via git diff against the main branch). - **With `files`** to review only specific files: `mcp__deepwork__get_review_instructions(files=["src/app.py", "src/lib.py"])`. When provided, only reviews whose include/exclude patterns match the given files will be returned. Use this when the user asks to review a particular file or set of files rather than the whole branch. @@ -31,10 +34,18 @@ Only proceed past this section if the user wants to **run** reviews. For each finding from the review agents: - **Obviously good changes with no downsides** (e.g., fixing a typo, removing an unused import, adding a missing null check): make the change immediately without asking. When in doubt, ask. -- **Everything else** (refactors, style changes, architectural suggestions, anything with trade-offs): use AskUserQuestion to ask the user about each finding **individually** — one question per finding. Do NOT batch unrelated findings into a single question. This lets the user make separate decisions on each item. For each question, provide several concrete fix approaches as options when there are reasonable alternatives (e.g., "Update the spec to match the code" vs "Update the code to match the spec" vs "Skip"). Be concise — quote the key finding, don't dump the full review. +- **Everything else** (refactors, style changes, architectural suggestions, anything with trade-offs): + 1. Use AskUserQuestion to ask the user about each finding **individually**. Do not group issues together unless they are the same issue occurring in multiple files; otherwise, use AskUserQuestion to ask about each issue separately. This lets the user make separate decisions on each item. + 2. For each question, provide several concrete fix approaches as options when there are reasonable alternatives (e.g., "Update the spec to match the code" vs "Update the code to match the spec" vs "Skip"). + 3. If a finding seems minor or debatable, include an option to suppress that error in the future — such as a clarification to the rule if it is too narrow, or a comment on the offending content if comments work in that context. + 4. Be concise — quote the key finding, don't dump the full review. ## Iterate -After making any changes, run the review again by calling `mcp__deepwork__get_review_instructions` (pass the same `files` argument if the original review was file-scoped, otherwise no arguments). +After making any changes: -Repeat the full cycle (run → act on results → run again) until a clean run produces no further actionable findings. A clean run is one where all review agents return no findings, or all remaining findings have been explicitly skipped by the user. Note that you don't have to run EVERY task the above outputs after the first time - you can just run the ones that match up to ones that had feedback last time. If you made very large changes, it may be a good idea to run the full reviews set. Subsequent runs automatically skip reviews that already passed, as long as the reviewed files remain unchanged. +1. Call `mcp__deepwork__get_review_instructions` again (with the same `files` argument if the original review was file-scoped, otherwise no arguments). +2. Repeat the cycle (run → act on results → run again) until a clean run — one where all review agents return no findings, or all remaining findings have been explicitly skipped by the user. +3. On subsequent runs, you only need to re-run tasks that had findings last time — skip tasks that were clean. +4. If you made very large changes, consider re-running the full review set. +5. Reviews that already passed are automatically skipped as long as reviewed files remain unchanged. diff --git a/specs/deepwork/jobs/JOBS-REQ-001-mcp-workflow-tools.md b/specs/deepwork/jobs/JOBS-REQ-001-mcp-workflow-tools.md index 061df713..fe40700e 100644 --- a/specs/deepwork/jobs/JOBS-REQ-001-mcp-workflow-tools.md +++ b/specs/deepwork/jobs/JOBS-REQ-001-mcp-workflow-tools.md @@ -2,7 +2,7 @@ ## Overview -The DeepWork MCP server exposes four tools to AI agents via the Model Context Protocol (MCP): `get_workflows`, `start_workflow`, `finished_step`, and `abort_workflow`. These tools constitute the primary runtime interface through which agents discover, execute, and manage multi-step workflows. The server is built on FastMCP and all tool responses are serialized as dictionaries via Pydantic `model_dump()`. +The DeepWork MCP server exposes five workflow tools to AI agents via the Model Context Protocol (MCP): `get_workflows`, `start_workflow`, `finished_step`, `abort_workflow`, and `go_to_step`. These tools constitute the primary runtime interface through which agents discover, execute, and manage multi-step workflows. The server is built on FastMCP and all tool responses are serialized as dictionaries via Pydantic `model_dump()`. ## Requirements @@ -18,7 +18,7 @@ The DeepWork MCP server exposes four tools to AI agents via the Model Context Pr 8. When `enable_quality_gate` is `True` and `external_runner` is `None`, the system MUST create a `QualityGate` with `cli=None` and `max_inline_files=0`. 9. When `enable_quality_gate` is `False`, the system MUST NOT create a `QualityGate` instance. 10. The server MUST be named `"deepwork"`. -11. The server MUST include instructions text describing the workflow lifecycle (Discover, Start, Execute, Checkpoint, Iterate, Continue, Complete). +11. The server MUST include instructions text describing the workflow lifecycle (Discover, Start, Execute, Checkpoint, Iterate, Continue, Complete, Going Back). 12. Every tool call MUST be logged with the tool name and current stack state. ### JOBS-REQ-001.2: get_workflows Tool @@ -57,7 +57,7 @@ The DeepWork MCP server exposes four tools to AI agents via the Model Context Pr 1. The `finished_step` tool MUST be registered as an asynchronous MCP tool. 2. The tool MUST require an `outputs` parameter: a dict mapping output names to file path(s). 3. The tool MUST accept optional parameters: `notes` (str), `quality_review_override_reason` (str), `session_id` (str). -4. The tool MUST raise `StateError` if no active workflow session exists and no `session_id` is provided. +4. The tool MUST raise `ToolError` if no active workflow session exists and no `session_id` is provided. The error message MUST explain what the tool does and provide guidance on how to resume a workflow. 5. When `session_id` is provided, the tool MUST target the session with that ID rather than the top-of-stack session. 6. The tool MUST validate submitted outputs against the current step's declared output specifications (see JOBS-REQ-001.5). 7. The tool MUST return a response with a `status` field that is one of: `"needs_work"`, `"next_step"`, or `"workflow_complete"`. @@ -102,7 +102,25 @@ The DeepWork MCP server exposes four tools to AI agents via the Model Context Pr 7. The response MUST contain: `aborted_workflow` (formatted as `"job_name/workflow_name"`), `aborted_step`, `explanation`, `stack`, `resumed_workflow` (or None), and `resumed_step` (or None). 8. If a parent workflow exists on the stack after abortion, `resumed_workflow` and `resumed_step` MUST reflect that parent's state. -### JOBS-REQ-001.7: Tool Response Serialization +### JOBS-REQ-001.7: go_to_step Tool + +1. The `go_to_step` tool MUST be registered as an asynchronous MCP tool. +2. The tool MUST require a `step_id` parameter (str). +3. The tool MUST accept an optional `session_id` parameter (str, default: None). +4. The tool MUST raise `StateError` if no active workflow session exists and no `session_id` is provided. +5. When `session_id` is provided, the tool MUST target the session with that ID rather than the top-of-stack session. +6. The tool MUST raise `ToolError` if the specified `step_id` does not exist in the workflow. The error message MUST list the available step names. +7. The tool MUST raise `ToolError` if the target step's entry index is greater than the current entry index (forward navigation). The error message MUST direct the agent to use `finished_step` to advance forward. +8. The tool MUST allow navigating to the current step (target entry index == current entry index) to restart it. +9. The tool MUST collect all step IDs from the target entry index through the end of the workflow as invalidated steps. +10. The tool MUST clear session tracking state (step progress) for all invalidated step IDs via the StateManager. Files on disk MUST NOT be deleted. +11. For concurrent step entries, the tool MUST navigate to the first step ID in the entry. +12. The tool MUST mark the target step as started after clearing invalidated progress. +13. The response MUST contain a `begin_step` object with: `session_id`, `step_id`, `job_dir`, `step_expected_outputs`, `step_reviews`, `step_instructions`, and `common_job_info`. +14. The response MUST contain an `invalidated_steps` field listing all step IDs whose progress was cleared. +15. The response MUST contain a `stack` field reflecting the current workflow stack after navigation. + +### JOBS-REQ-001.8: Tool Response Serialization 1. All tool responses MUST be serialized via Pydantic's `model_dump()` method, returning plain dictionaries. 2. The `StepStatus` enum values MUST be: `"needs_work"`, `"next_step"`, `"workflow_complete"`. diff --git a/specs/deepwork/jobs/JOBS-REQ-003-workflow-session-management.md b/specs/deepwork/jobs/JOBS-REQ-003-workflow-session-management.md index f198f08d..0d4cdad5 100644 --- a/specs/deepwork/jobs/JOBS-REQ-003-workflow-session-management.md +++ b/specs/deepwork/jobs/JOBS-REQ-003-workflow-session-management.md @@ -56,10 +56,10 @@ The StateManager manages workflow session state with support for stack-based nes ### JOBS-REQ-003.7: Session ID Routing -1. `_resolve_session(session_id)` MUST search the entire stack for a session matching the provided `session_id`. -2. If `session_id` is provided but not found in the stack, `_resolve_session()` MUST raise `StateError`. -3. If `session_id` is `None`, `_resolve_session()` MUST fall back to `require_active_session()` (top-of-stack). -4. All state-modifying methods that accept `session_id` (start_step, complete_step, record_quality_attempt, advance_to_step, complete_workflow, abort_workflow) MUST use `_resolve_session()` for session lookup. +1. `resolve_session(session_id)` MUST search the entire stack for a session matching the provided `session_id`. +2. If `session_id` is provided but not found in the stack, `resolve_session()` MUST raise `StateError`. +3. If `session_id` is `None`, `resolve_session()` MUST fall back to `require_active_session()` (top-of-stack). +4. All state-modifying methods that accept `session_id` (start_step, complete_step, record_quality_attempt, advance_to_step, complete_workflow, abort_workflow, go_to_step) MUST use `resolve_session()` for session lookup. ### JOBS-REQ-003.8: Step Progress Tracking @@ -142,3 +142,16 @@ The StateManager manages workflow session state with support for stack-based nes 3. The `status` field MUST be one of: `"active"`, `"completed"`, `"aborted"`. 4. The `step_progress` field MUST be a dict mapping step IDs to `StepProgress` objects. 5. The `StepProgress` model MUST track: `step_id`, `started_at`, `completed_at`, `outputs`, `notes`, `quality_attempts` (default 0). + +### JOBS-REQ-003.19: Step Navigation (go_to_step) + +1. `go_to_step()` MUST be an async method. +2. `go_to_step()` MUST acquire the async lock before modifying state. +3. `go_to_step()` MUST accept a `step_id` parameter (str), an `entry_index` parameter (int), and an `invalidate_step_ids` parameter (list of str). +4. `go_to_step()` MUST accept an optional `session_id` parameter (str or None, default: None). +5. `go_to_step()` MUST use `resolve_session()` for session lookup when `session_id` is provided. +6. `go_to_step()` MUST delete `step_progress` entries for all step IDs in `invalidate_step_ids`. +7. `go_to_step()` MUST preserve `step_progress` entries for steps not in `invalidate_step_ids`. +8. `go_to_step()` MUST update `current_step_id` to the provided `step_id`. +9. `go_to_step()` MUST update `current_entry_index` to the provided `entry_index`. +10. `go_to_step()` MUST persist the session after modification. diff --git a/src/deepwork/jobs/mcp/schemas.py b/src/deepwork/jobs/mcp/schemas.py index f436af4b..9ba0a930 100644 --- a/src/deepwork/jobs/mcp/schemas.py +++ b/src/deepwork/jobs/mcp/schemas.py @@ -132,6 +132,20 @@ class AbortWorkflowInput(BaseModel): ) +class GoToStepInput(BaseModel): + """Input for go_to_step tool.""" + + step_id: str = Field(description="ID of the step to navigate back to") + session_id: str | None = Field( + default=None, + description=( + "Optional session ID to target a specific workflow session. " + "Use this when multiple workflows are active concurrently to ensure " + "the correct session is updated. If omitted, operates on the top-of-stack session." + ), + ) + + # ============================================================================= # Quality Gate Models # ============================================================================= @@ -303,6 +317,18 @@ class AbortWorkflowResponse(BaseModel): resumed_step: str | None = Field(default=None, description="The step now active (if any)") +class GoToStepResponse(BaseModel): + """Response from go_to_step tool.""" + + begin_step: ActiveStepInfo = Field(description="Information about the step to begin working on") + invalidated_steps: list[str] = Field( + description="Step IDs whose progress was cleared (from target step onward)" + ) + stack: list[StackEntry] = Field( + default_factory=list, description="Current workflow stack after navigation" + ) + + # ============================================================================= # Session State Models # ============================================================================= diff --git a/src/deepwork/jobs/mcp/server.py b/src/deepwork/jobs/mcp/server.py index 62d12de4..a36fef75 100644 --- a/src/deepwork/jobs/mcp/server.py +++ b/src/deepwork/jobs/mcp/server.py @@ -24,6 +24,7 @@ from deepwork.jobs.mcp.schemas import ( AbortWorkflowInput, FinishedStepInput, + GoToStepInput, StartWorkflowInput, ) from deepwork.jobs.mcp.state import StateManager @@ -221,6 +222,31 @@ async def abort_workflow( response = await tools.abort_workflow(input_data) return response.model_dump() + @mcp.tool( + description=( + "Navigate back to a prior step in the current workflow. " + "Clears all progress from the target step onward, forcing re-execution " + "of subsequent steps to ensure consistency. " + "Use this when earlier outputs need revision or quality issues are discovered. " + "Files on disk are NOT deleted — only session tracking state is cleared. " + "Required: step_id (the step to go back to). " + "Optional: session_id to target a specific workflow session " + "(use when multiple workflows are active concurrently)." + ) + ) + async def go_to_step( + step_id: str, + session_id: str | None = None, + ) -> dict[str, Any]: + """Navigate back to a prior step, clearing subsequent progress.""" + _log_tool_call( + "go_to_step", + {"step_id": step_id, "session_id": session_id}, + ) + input_data = GoToStepInput(step_id=step_id, session_id=session_id) + response = await tools.go_to_step(input_data) + return response.model_dump() + # ---- Review tool (outside the workflow lifecycle) ---- from deepwork.review.mcp import ReviewToolError, run_review @@ -327,6 +353,14 @@ def _get_server_instructions() -> str: - If there was a parent workflow, it becomes active again - The explanation is saved for debugging and audit purposes +## Going Back + +Use `go_to_step` to navigate back to a prior step when earlier outputs need revision: +- All progress from the target step onward is cleared (outputs, timestamps, quality attempts) +- The agent must re-execute all steps from the target onward +- Files on disk are NOT deleted — only session tracking state is cleared +- Cannot go forward — use `finished_step` to advance + ## Best Practices - Always call `get_workflows` first to understand available options diff --git a/src/deepwork/jobs/mcp/state.py b/src/deepwork/jobs/mcp/state.py index e0d88a18..ce98444c 100644 --- a/src/deepwork/jobs/mcp/state.py +++ b/src/deepwork/jobs/mcp/state.py @@ -172,12 +172,12 @@ def require_active_session(self) -> WorkflowSession: raise StateError("No active workflow session. Use start_workflow to begin a workflow.") return self._session_stack[-1] - def _resolve_session(self, session_id: str | None = None) -> WorkflowSession: + def resolve_session(self, session_id: str | None = None) -> WorkflowSession: """Resolve a session by ID or fall back to top-of-stack. - This is used internally (called inside locked blocks or sync methods) - to find a specific session when session_id is provided, or fall back - to the default top-of-stack behavior. + Looks up a specific session when session_id is provided, or falls back + to the default top-of-stack behavior. This is a synchronous method that + reads the session stack without acquiring the async lock. Args: session_id: Optional session ID to look up. If None, returns top-of-stack. @@ -206,7 +206,7 @@ async def start_step(self, step_id: str, session_id: str | None = None) -> None: StateError: If no active session or session_id not found """ async with self._lock: - session = self._resolve_session(session_id) + session = self.resolve_session(session_id) now = datetime.now(UTC).isoformat() if step_id not in session.step_progress: @@ -239,7 +239,7 @@ async def complete_step( StateError: If no active session or session_id not found """ async with self._lock: - session = self._resolve_session(session_id) + session = self.resolve_session(session_id) now = datetime.now(UTC).isoformat() if step_id not in session.step_progress: @@ -269,7 +269,7 @@ async def record_quality_attempt(self, step_id: str, session_id: str | None = No StateError: If no active session or session_id not found """ async with self._lock: - session = self._resolve_session(session_id) + session = self.resolve_session(session_id) if step_id not in session.step_progress: session.step_progress[step_id] = StepProgress(step_id=step_id) @@ -293,11 +293,43 @@ async def advance_to_step( StateError: If no active session or session_id not found """ async with self._lock: - session = self._resolve_session(session_id) + session = self.resolve_session(session_id) session.current_step_id = step_id session.current_entry_index = entry_index await self._save_session_unlocked(session) + async def go_to_step( + self, + step_id: str, + entry_index: int, + invalidate_step_ids: list[str], + session_id: str | None = None, + ) -> None: + """Navigate back to a prior step, clearing progress from that step onward. + + Args: + step_id: Step ID to navigate to + entry_index: Index of the target entry in workflow step_entries + invalidate_step_ids: Step IDs whose progress should be cleared + session_id: Optional session ID to target a specific session + + Raises: + StateError: If no active session or session_id not found + """ + async with self._lock: + session = self.resolve_session(session_id) + + # Clear progress for all invalidated steps + for sid in invalidate_step_ids: + if sid in session.step_progress: + del session.step_progress[sid] + + # Update position + session.current_step_id = step_id + session.current_entry_index = entry_index + + await self._save_session_unlocked(session) + async def complete_workflow(self, session_id: str | None = None) -> WorkflowSession | None: """Mark the workflow as complete and remove from stack. @@ -312,7 +344,7 @@ async def complete_workflow(self, session_id: str | None = None) -> WorkflowSess StateError: If no active session or session_id not found """ async with self._lock: - session = self._resolve_session(session_id) + session = self.resolve_session(session_id) now = datetime.now(UTC).isoformat() session.completed_at = now session.status = "completed" @@ -343,7 +375,7 @@ async def abort_workflow( StateError: If no active session or session_id not found """ async with self._lock: - session = self._resolve_session(session_id) + session = self.resolve_session(session_id) now = datetime.now(UTC).isoformat() session.completed_at = now session.status = "aborted" @@ -371,7 +403,7 @@ def get_all_outputs(self, session_id: str | None = None) -> dict[str, str | list Raises: StateError: If no active session or session_id not found """ - session = self._resolve_session(session_id) + session = self.resolve_session(session_id) all_outputs: dict[str, str | list[str]] = {} for progress in session.step_progress.values(): all_outputs.update(progress.outputs) diff --git a/src/deepwork/jobs/mcp/tools.py b/src/deepwork/jobs/mcp/tools.py index 964c7259..76557cc2 100644 --- a/src/deepwork/jobs/mcp/tools.py +++ b/src/deepwork/jobs/mcp/tools.py @@ -4,6 +4,8 @@ - get_workflows: List all available workflows - start_workflow: Initialize a workflow session - finished_step: Report step completion and get next instructions +- abort_workflow: Abort the current workflow +- go_to_step: Navigate back to a prior step """ from __future__ import annotations @@ -23,6 +25,8 @@ FinishedStepInput, FinishedStepResponse, GetWorkflowsResponse, + GoToStepInput, + GoToStepResponse, JobInfo, JobLoadErrorInfo, ReviewInfo, @@ -31,12 +35,14 @@ StepStatus, WorkflowInfo, ) -from deepwork.jobs.mcp.state import StateManager +from deepwork.jobs.mcp.state import StateError, StateManager from deepwork.jobs.parser import ( JobDefinition, OutputSpec, ParseError, + Step, Workflow, + WorkflowStepEntry, parse_job_definition, ) @@ -291,6 +297,44 @@ def _build_expected_outputs(outputs: list[OutputSpec]) -> list[ExpectedOutput]: for out in outputs ] + def _build_active_step_info( + self, + session_id: str, + step_id: str, + job: JobDefinition, + step: Step, + instructions: str, + step_outputs: list[ExpectedOutput], + ) -> ActiveStepInfo: + """Build an ActiveStepInfo from a step definition and its context.""" + return ActiveStepInfo( + session_id=session_id, + step_id=step_id, + job_dir=str(job.job_dir), + step_expected_outputs=step_outputs, + step_reviews=[ + ReviewInfo( + run_each=r.run_each, + quality_criteria=r.quality_criteria, + additional_review_guidance=r.additional_review_guidance, + ) + for r in step.reviews + ], + step_instructions=instructions, + common_job_info=job.common_job_info_provided_to_all_steps_at_runtime, + ) + + @staticmethod + def _append_concurrent_info(instructions: str, entry: WorkflowStepEntry) -> str: + """Append concurrent step info to instructions if applicable.""" + if entry.is_concurrent and len(entry.step_ids) > 1: + instructions += ( + f"\n\n**CONCURRENT STEPS**: This entry has {len(entry.step_ids)} " + f"steps that can run in parallel: {', '.join(entry.step_ids)}\n" + f"Use the Task tool to execute them concurrently." + ) + return instructions + # ========================================================================= # Tool Implementations # ========================================================================= @@ -361,21 +405,8 @@ async def start_workflow(self, input_data: StartWorkflowInput) -> StartWorkflowR step_outputs = self._build_expected_outputs(first_step.outputs) return StartWorkflowResponse( - begin_step=ActiveStepInfo( - session_id=session.session_id, - step_id=first_step_id, - job_dir=str(job.job_dir), - step_expected_outputs=step_outputs, - step_reviews=[ - ReviewInfo( - run_each=r.run_each, - quality_criteria=r.quality_criteria, - additional_review_guidance=r.additional_review_guidance, - ) - for r in first_step.reviews - ], - step_instructions=instructions, - common_job_info=job.common_job_info_provided_to_all_steps_at_runtime, + begin_step=self._build_active_step_info( + session.session_id, first_step_id, job, first_step, instructions, step_outputs ), stack=self.state_manager.get_stack(), ) @@ -393,7 +424,15 @@ async def finished_step(self, input_data: FinishedStepInput) -> FinishedStepResp StateError: If no active session ToolError: If quality gate fails after max attempts """ - session = self.state_manager._resolve_session(input_data.session_id) + try: + session = self.state_manager.resolve_session(input_data.session_id) + except StateError as err: + raise ToolError( + "No active workflow session. " + "The finished_step tool reports completion of a step within a running workflow. " + "If you want to resume a workflow, just start it again and call finished_step " + "with quality_review_override_reason until you get back to your prior step." + ) from err sid = session.session_id current_step_id = session.current_step_id @@ -542,34 +581,12 @@ async def finished_step(self, input_data: FinishedStepInput) -> FinishedStepResp step_outputs = self._build_expected_outputs(next_step.outputs) # Add info about concurrent steps if this is a concurrent entry - if next_entry.is_concurrent and len(next_entry.step_ids) > 1: - concurrent_info = ( - f"\n\n**CONCURRENT STEPS**: This entry has {len(next_entry.step_ids)} " - f"steps that can run in parallel: {', '.join(next_entry.step_ids)}\n" - f"Use the Task tool to execute them concurrently." - ) - instructions = instructions + concurrent_info - - # Reload session to get current state after advance - session = self.state_manager._resolve_session(sid) + instructions = self._append_concurrent_info(instructions, next_entry) return FinishedStepResponse( status=StepStatus.NEXT_STEP, - begin_step=ActiveStepInfo( - session_id=session.session_id, - step_id=next_step_id, - job_dir=str(job.job_dir), - step_expected_outputs=step_outputs, - step_reviews=[ - ReviewInfo( - run_each=r.run_each, - quality_criteria=r.quality_criteria, - additional_review_guidance=r.additional_review_guidance, - ) - for r in next_step.reviews - ], - step_instructions=instructions, - common_job_info=job.common_job_info_provided_to_all_steps_at_runtime, + begin_step=self._build_active_step_info( + sid, next_step_id, job, next_step, instructions, step_outputs ), stack=self.state_manager.get_stack(), ) @@ -600,3 +617,88 @@ async def abort_workflow(self, input_data: AbortWorkflowInput) -> AbortWorkflowR ), resumed_step=new_active.current_step_id if new_active else None, ) + + async def go_to_step(self, input_data: GoToStepInput) -> GoToStepResponse: + """Navigate back to a prior step, clearing progress from that step onward. + + This allows re-executing a step and all subsequent steps when earlier + outputs need revision. Only session tracking state is cleared — files + on disk are not deleted (Git handles file versioning). + + Args: + input_data: GoToStepInput with step_id and optional session_id + + Returns: + GoToStepResponse with step info, invalidated steps, and stack + + Raises: + StateError: If no active session + ToolError: If step not found or forward navigation attempted + """ + session = self.state_manager.resolve_session(input_data.session_id) + sid = session.session_id + + # Load job and workflow + job = self._get_job(session.job_name) + workflow = self._get_workflow(job, session.workflow_name) + + # Validate target step exists in workflow + target_entry_index = workflow.get_entry_index_for_step(input_data.step_id) + if target_entry_index is None: + raise ToolError( + f"Step '{input_data.step_id}' not found in workflow '{workflow.name}'. " + f"Available steps: {', '.join(workflow.steps)}" + ) + + # Validate not going forward (use finished_step for that) + current_entry_index = session.current_entry_index + if target_entry_index > current_entry_index: + raise ToolError( + f"Cannot go forward to step '{input_data.step_id}' " + f"(entry index {target_entry_index} > current {current_entry_index}). " + f"Use finished_step to advance forward." + ) + + # Validate step definition exists + target_step = job.get_step(input_data.step_id) + if target_step is None: + raise ToolError(f"Step definition not found: {input_data.step_id}") + + # Collect all step IDs from target entry index through end of workflow + invalidate_step_ids: list[str] = [] + for i in range(target_entry_index, len(workflow.step_entries)): + entry = workflow.step_entries[i] + invalidate_step_ids.extend(entry.step_ids) + + # For concurrent entries, navigate to the first step in the entry + target_entry = workflow.step_entries[target_entry_index] + nav_step_id = target_entry.step_ids[0] + nav_step = job.get_step(nav_step_id) + if nav_step is None: + raise ToolError(f"Step definition not found: {nav_step_id}") + + # Clear progress and update position + await self.state_manager.go_to_step( + step_id=nav_step_id, + entry_index=target_entry_index, + invalidate_step_ids=invalidate_step_ids, + session_id=sid, + ) + + # Mark target step as started + await self.state_manager.start_step(nav_step_id, session_id=sid) + + # Get step instructions + instructions = self._get_step_instructions(job, nav_step_id) + step_outputs = self._build_expected_outputs(nav_step.outputs) + + # Add concurrent step info if applicable + instructions = self._append_concurrent_info(instructions, target_entry) + + return GoToStepResponse( + begin_step=self._build_active_step_info( + sid, nav_step_id, job, nav_step, instructions, step_outputs + ), + invalidated_steps=invalidate_step_ids, + stack=self.state_manager.get_stack(), + ) diff --git a/src/deepwork/standard_jobs/deepwork_jobs/job.yml b/src/deepwork/standard_jobs/deepwork_jobs/job.yml index 2e4deca9..595758cf 100644 --- a/src/deepwork/standard_jobs/deepwork_jobs/job.yml +++ b/src/deepwork/standard_jobs/deepwork_jobs/job.yml @@ -1,6 +1,6 @@ # yaml-language-server: $schema=.deepwork/schemas/job.schema.json name: deepwork_jobs -version: "1.4.0" +version: "1.6.0" summary: "Creates and manages multi-step AI workflows. Use when defining, implementing, testing, or improving DeepWork jobs." common_job_info_provided_to_all_steps_at_runtime: | Core commands for managing DeepWork jobs. These commands help you define new multi-step @@ -51,14 +51,7 @@ steps: description: "Definition of the job and its workflows" required: true dependencies: [] - reviews: - - run_each: job.yml - quality_criteria: - "Intermediate Deliverables": "The job breaks out across logical steps with reviewable intermediate deliverables." - "Reviews": | - Reviews are defined for each step. Particularly critical documents have their own reviews. - Note that the reviewers do not have transcript access, so if the criteria are about the conversation, - then add a `.deepwork/tmp/[step_summary].md` step output file so the agent has a communication channel to the reviewer. + reviews: [] - id: implement name: "Implement Job Steps" @@ -74,17 +67,7 @@ steps: required: true dependencies: - define - reviews: - - run_each: step_instruction_files - additional_review_guidance: "Read the job.yml file in the same job directory for context on how this instruction file fits into the larger workflow." - quality_criteria: - "Complete Instructions": "The instruction file is complete (no stubs or placeholders)." - "Specific & Actionable": "Instructions are tailored to the step's purpose, not generic." - "Output Examples": "The instruction file shows what good output looks like. This can be either template examples, or negative examples of what not to do. Only required if the step has outputs." - "Quality Criteria": "The instruction file defines quality criteria for its outputs." - "Ask Structured Questions": "If this step gathers user input, instructions explicitly use the phrase 'ask structured questions'. If the step has no user inputs, this criterion passes automatically." - "Prompt Engineering": "The instruction file follows Anthropic's best practices for prompt engineering." - "No Redundant Info": "The instruction file avoids duplicating information that belongs in the job.yml's common_job_info_provided_to_all_steps_at_runtime section. Shared context (project background, terminology, conventions) is in common_job_info, not repeated in each step." + reviews: [] - id: test name: "Test the New Workflow" @@ -171,8 +154,6 @@ steps: "Conversation Analyzed": "The agent reviewed the conversation for DeepWork job executions." "Confusion Identified": "The agent identified points of confusion, errors, or inefficiencies." "Instructions Improved": "Job instructions were updated to address identified issues." - "Instructions Concise": "Instructions are free of redundancy and unnecessary verbosity." - "Shared Content Extracted": "Lengthy/duplicated content is extracted into referenced files." "Bespoke Learnings Captured": "Run-specific learnings were added to AGENTS.md." "File References Used": "AGENTS.md entries reference other files where appropriate." "Working Folder Correct": "AGENTS.md is in the correct working folder for the job." @@ -225,8 +206,6 @@ steps: "Exposed Field Addressed": "`exposed: true` fields are removed or noted as deprecated." "Stop Hooks Migrated": "`stop_hooks` are migrated to `hooks.after_agent` format." "Removed Steps Cleaned": "References to removed steps (like `review_job_spec`) are updated." - "Orphaned Steps Fixed": "For jobs with no workflows, there is a single workflow (named after the job) containing all steps. For jobs with existing workflows, each orphan gets its own workflow (named after the step)." - "Promise Lines Removed": "Step instructions do not include anything about `Quality Criteria Met`." "job.ymls are readable": "Calling `get_workflows` from the Deepwork tool shows all expected jobs. If any are missing, its YML is likely bad." - id: errata diff --git a/src/deepwork/standard_jobs/deepwork_jobs/steps/define.md b/src/deepwork/standard_jobs/deepwork_jobs/steps/define.md index cb4307ca..0ff8a969 100644 --- a/src/deepwork/standard_jobs/deepwork_jobs/steps/define.md +++ b/src/deepwork/standard_jobs/deepwork_jobs/steps/define.md @@ -52,60 +52,7 @@ For each major phase they mentioned, ask structured questions to gather details: - Where should each output be saved? (filename/path) - Should outputs be organized in subdirectories? (e.g., `reports/`, `data/`, `drafts/`) - Will other steps need this output? - #### Work Product Storage Guidelines - - **Key principle**: Job outputs belong in the main repository directory structure, not in dot-directories. The `.deepwork/` directory is for job definitions and configuration only. - - **Why this matters**: - - **Version control**: Work products in the main repo are tracked by git and visible in PRs - - **Discoverability**: Team members can find outputs without knowing about DeepWork internals - - **Tooling compatibility**: IDEs, search tools, and CI/CD work naturally with standard paths - - **Glob patterns**: Well-structured paths enable powerful file matching (e.g., `competitive_research/**/*.md`) - - **Good output path patterns**: - ``` - competitive_research/competitors_list.md - competitive_research/acme_corp/research.md - operations/reports/2026-01/spending_analysis.md - docs/api/endpoints.md - ``` - - **Avoid these patterns**: - ``` - .deepwork/outputs/report.md # Hidden in dot-directory - output.md # Too generic, no context - research.md # Unclear which research - temp/draft.md # Transient-sounding paths - ``` - - **Organizing multi-file outputs**: - - Use the job name as a top-level folder when outputs are job-specific - - Use parameterized paths for per-entity outputs: `competitive_research/[competitor_name]/` - - Match existing project conventions when extending a codebase - - **When to include dates in paths**: - - **Include date** for periodic outputs where each version is retained (e.g., monthly reports, quarterly reviews, weekly summaries). These accumulate over time and historical versions remain useful. - ``` - operations/reports/2026-01/spending_analysis.md # Monthly report - keep history - hr/employees/[employee_name]/quarterly_reviews/2026-Q1.pdf # Per-employee quarterly review - ``` - - **Omit date** for current-state outputs that represent the latest understanding and get updated in place. Previous versions live in git history, not separate files. - ``` - competitive_research/acme_corp/swot.md # Current SWOT - updated over time - docs/architecture/overview.md # Living document - ``` - - **Supporting materials and intermediate outputs**: - - Content generated in earlier steps to support the final output (research notes, data extracts, drafts) should be placed in a `_dataroom` folder that is a peer to the final output - - Name the dataroom folder by replacing the file extension with `_dataroom` - ``` - operations/reports/2026-01/spending_analysis.md # Final output - operations/reports/2026-01/spending_analysis_dataroom/ # Supporting materials - raw_data.csv - vendor_breakdown.md - notes.md - ``` - - This keeps supporting materials organized and discoverable without cluttering the main output location + - When discussing output paths, follow the **Work Product Storage Guidelines** in the reference section below. 4. **Step Dependencies** - Which previous steps must complete before this one? @@ -165,6 +112,38 @@ The `research_all` step's instructions should tell the agent to: **When to recognize this pattern:** Look for language like "for each X, do Y" where Y involves more than one logical phase. If Y is a single simple action, a regular step with a loop is fine. If Y is itself a multi-step process with intermediate outputs worth reviewing, split it into a sub-workflow. +### Iterative Loop Pattern (go_to_step) + +When a workflow needs to repeat a group of steps based on feedback or evolving requirements (e.g., draft → review → revise cycles, or research → analyze → check coverage → research more), use the `go_to_step` MCP tool to create a loop. + +**How it works:** A later step in the workflow evaluates the work so far and decides whether to loop back. If a loop is needed, the step's instructions tell the agent to call `go_to_step` with the step ID to return to. This clears all progress from that step onward and re-presents the step's instructions, so the agent re-executes the target step and all subsequent steps with fresh context. + +**How to structure it in `job.yml`:** + +```yaml +workflows: + - name: iterative_report + summary: "Create a report with iterative refinement" + steps: + - gather_data + - write_draft + - review_draft # This step may loop back to gather_data or write_draft + - finalize +``` + +The `review_draft` step's instructions should tell the agent to: +- Evaluate the draft against acceptance criteria +- If data gaps are found: call `go_to_step` with `step_id: "gather_data"` to collect more data and re-draft +- If the draft needs revision but data is sufficient: call `go_to_step` with `step_id: "write_draft"` to revise +- If the draft meets all criteria: proceed normally by calling `finished_step` + +**Important design considerations:** +- **Keep loops bounded**: The decision step's instructions should include a maximum iteration count or clear exit criteria to prevent infinite loops +- **State is cleared**: When `go_to_step` navigates back, all progress from the target step onward is cleared (outputs, timestamps, quality attempts). The agent must re-execute those steps. Files on disk are NOT deleted — only session tracking state is reset. +- **Use for multi-step loops only**: If only a single step needs to retry, the quality review system (`needs_work` from `finished_step`) already handles that. Use `go_to_step` when the loop spans multiple steps. + +**When to recognize this pattern:** Look for language like "keep refining until X", "iterate until satisfied", "go back and redo Y if Z", or any cycle where later steps may invalidate earlier work. If the iteration involves just one step retrying its own output, rely on quality reviews instead. + ### Step 3: Validate the Workflow After gathering information about all steps: @@ -273,16 +252,16 @@ Only after you have complete understanding, create the job directory and `job.ym **First, create the directory structure** using the `make_new_job.sh` script located in this job's directory (the `job_dir` path from the workflow response): ```bash -/make_new_job.sh [job_name] +[job_dir]/make_new_job.sh [job_name] ``` **Then create the job.yml file** at `.deepwork/jobs/[job_name]/job.yml` -(Where `[job_name]` is the name of the NEW job you're creating, e.g., `competitive_research`. Replace `` with the actual `job_dir` path from the workflow response.) +(Where `[job_name]` is the name of the NEW job you're creating, e.g., `competitive_research`. Replace `[job_dir]` with the actual `job_dir` path from the workflow response.) -**Template reference**: See `/templates/job.yml.template` for the standard structure. +**Template reference**: See `[job_dir]/templates/job.yml.template` for the standard structure. -**Complete example**: See `/templates/job.yml.example` for a fully worked example. +**Complete example**: See `[job_dir]/templates/job.yml.example` for a fully worked example. **Important**: - Use lowercase with underscores for job name and step IDs @@ -440,3 +419,60 @@ After creating the file: 2. Recommend that they review the job.yml file 3. Tell them the next step is to implement the job (generate step instruction files) +--- + +## Reference: Work Product Storage Guidelines + +**Key principle**: Job outputs belong in the main repository directory structure, not in dot-directories. The `.deepwork/` directory is for job definitions and configuration only. + +**Why this matters**: +- **Version control**: Work products in the main repo are tracked by git and visible in PRs +- **Discoverability**: Team members can find outputs without knowing about DeepWork internals +- **Tooling compatibility**: IDEs, search tools, and CI/CD work naturally with standard paths +- **Glob patterns**: Well-structured paths enable powerful file matching (e.g., `competitive_research/**/*.md`) + +**Good output path patterns**: +``` +competitive_research/competitors_list.md +competitive_research/acme_corp/research.md +operations/reports/2026-01/spending_analysis.md +docs/api/endpoints.md +``` + +**Avoid these patterns**: +``` +.deepwork/outputs/report.md # Hidden in dot-directory +output.md # Too generic, no context +research.md # Unclear which research +temp/draft.md # Transient-sounding paths +``` + +**Organizing multi-file outputs**: +- Use the job name as a top-level folder when outputs are job-specific +- Use parameterized paths for per-entity outputs: `competitive_research/[competitor_name]/` +- Match existing project conventions when extending a codebase + +**When to include dates in paths**: +- **Include date** for periodic outputs where each version is retained (e.g., monthly reports, quarterly reviews, weekly summaries). These accumulate over time and historical versions remain useful. + ``` + operations/reports/2026-01/spending_analysis.md # Monthly report - keep history + hr/employees/[employee_name]/quarterly_reviews/2026-Q1.pdf # Per-employee quarterly review + ``` +- **Omit date** for current-state outputs that represent the latest understanding and get updated in place. Previous versions live in git history, not separate files. + ``` + competitive_research/acme_corp/swot.md # Current SWOT - updated over time + docs/architecture/overview.md # Living document + ``` + +**Supporting materials and intermediate outputs**: +- Content generated in earlier steps to support the final output (research notes, data extracts, drafts) should be placed in a `_dataroom` folder that is a peer to the final output +- Name the dataroom folder by replacing the file extension with `_dataroom` + ``` + operations/reports/2026-01/spending_analysis.md # Final output + operations/reports/2026-01/spending_analysis_dataroom/ # Supporting materials + raw_data.csv + vendor_breakdown.md + notes.md + ``` +- This keeps supporting materials organized and discoverable without cluttering the main output location + diff --git a/src/deepwork/standard_jobs/deepwork_jobs/steps/implement.md b/src/deepwork/standard_jobs/deepwork_jobs/steps/implement.md index e6771604..516b294a 100644 --- a/src/deepwork/standard_jobs/deepwork_jobs/steps/implement.md +++ b/src/deepwork/standard_jobs/deepwork_jobs/steps/implement.md @@ -82,6 +82,41 @@ If a step in the job.yml has `reviews` defined, the generated instruction file s This alignment ensures the AI agent knows exactly what will be validated and can self-check before completing. +### Writing Loop Instructions (go_to_step) + +If a step in the job.yml is designed as a decision point that may loop back to an earlier step (see the "Iterative Loop Pattern" in the define step), the instruction file for that step must include clear guidance on when and how to use `go_to_step`. + +**What to include in the instruction file:** + +1. **Evaluation criteria** — Explicit conditions that determine whether to loop back or proceed +2. **Which step to go back to** — The specific `step_id` to pass to `go_to_step`, and why that step (not an earlier or later one) +3. **Maximum iterations** — A bound to prevent infinite loops (e.g., "After 3 iterations, proceed to the next step regardless and note remaining issues") +4. **How to call it** — Tell the agent to call the `go_to_step` MCP tool with the target `step_id` + +**Example instruction snippet for a review/decision step:** + +```markdown +## Evaluation + +Review the draft against the acceptance criteria defined in the job description. + +### If the draft needs more data: +Call `go_to_step` with `step_id: "gather_data"` to loop back and collect additional +information. This will clear progress from gather_data onward — you will re-execute +gather_data, write_draft, and this review step with the new data. + +### If the draft needs revision but data is sufficient: +Call `go_to_step` with `step_id: "write_draft"` to revise the draft. + +### If the draft meets all criteria: +Proceed normally by calling `finished_step` with the review output. + +**Maximum iterations**: If this is the 3rd review cycle, proceed to the next step +regardless and document any remaining issues in the output. +``` + +**Important**: Only add `go_to_step` instructions to steps that are explicitly designed as loop decision points in the workflow. Most steps should NOT reference `go_to_step`. + ### Using Supplementary Reference Files Step instructions can include additional `.md` files in the `steps/` directory for detailed examples, templates, or reference material. Reference them using the full path from the project root. diff --git a/src/deepwork/standard_jobs/deepwork_jobs/templates/job.yml.template b/src/deepwork/standard_jobs/deepwork_jobs/templates/job.yml.template index c2256161..bab9d8ab 100644 --- a/src/deepwork/standard_jobs/deepwork_jobs/templates/job.yml.template +++ b/src/deepwork/standard_jobs/deepwork_jobs/templates/job.yml.template @@ -21,6 +21,16 @@ workflows: steps: - [step_id] - [another_step] + # Iterative loop pattern: A later step can call go_to_step to loop + # back to an earlier step. This clears progress from the target step + # onward and re-executes those steps. Add loop logic in the step's + # instruction file, not in job.yml. Example: + # steps: + # - gather_data + # - write_draft + # - review_draft # Instructions tell agent to call go_to_step + # # back to gather_data or write_draft if needed + # - finalize steps: - id: [step_id] diff --git a/src/deepwork/standard_jobs/deepwork_reviews/job.yml b/src/deepwork/standard_jobs/deepwork_reviews/job.yml index bf721a98..1b8836de 100644 --- a/src/deepwork/standard_jobs/deepwork_reviews/job.yml +++ b/src/deepwork/standard_jobs/deepwork_reviews/job.yml @@ -1,5 +1,5 @@ name: deepwork_reviews -version: "1.0.1" +version: "1.1.0" summary: "Manage .deepreview rules for automated code review" common_job_info_provided_to_all_steps_at_runtime: | This job manages .deepreview configuration files, which define automated code review @@ -150,24 +150,10 @@ steps: - run_each: deepreview_files additional_review_guidance: | Read each .deepreview file and the documentation files its rules protect. - Verify that the trigger scope of each rule is as narrow as possible — - it should only fire when files that could actually affect the doc's accuracy - change. Consider whether having more separate reviews with narrower scope - is actually more efficient than a slightly wider, shared review — each - review spawns a sub-agent with material overhead. quality_criteria: "Documentation Covered": > Every project documentation file that describes the project itself has a corresponding rule (either newly created or pre-existing). - "Trigger Scope Minimal": > - Each rule's match.include patterns are as narrow as possible while still - catching changes that could affect the protected documentation. Rules are - not overly broad (e.g., matching all files when only a specific directory matters). - "Efficient Rule Count": > - Where multiple narrow rules have substantially overlapping triggers, they - have been merged into shared rules. The total number of rules is the minimum - needed for adequate coverage. Separate narrow rules are only used when their - trigger sets are genuinely disjoint. - id: add_language_reviews name: "Add Language-Specific Code Review Rules" @@ -267,20 +253,8 @@ steps: - run_each: deepreview_file additional_review_guidance: | Read the dependency analysis from the previous step to verify the rule - faithfully implements the approved plan. Read the documentation file - referenced in the rule's instructions. Check the .deepreview file for - valid YAML syntax and consistency with existing rules. + faithfully implements the approved plan. quality_criteria: "Faithful Implementation": > The rule accurately implements the dependency analysis from the previous step — same match patterns, same strategy, same rule name convention. - "Valid Configuration": > - The .deepreview YAML is syntactically valid and follows the schema. - All required fields (description, match.include, review.strategy, - review.instructions) are present. - "Effective Instructions": > - Review instructions clearly tell the reviewer to check whether the - documentation file is still accurate given the source file changes. - The documentation file path is explicitly referenced. Uses - additional_context.unchanged_matching_files: true so the reviewer - can read the doc even when only source files changed. diff --git a/tests/unit/jobs/mcp/test_state.py b/tests/unit/jobs/mcp/test_state.py index c0d43f7e..64489e73 100644 --- a/tests/unit/jobs/mcp/test_state.py +++ b/tests/unit/jobs/mcp/test_state.py @@ -316,19 +316,6 @@ async def test_delete_session(self, state_manager: StateManager) -> None: class TestStateManagerStack: """Tests for stack-based workflow nesting.""" - @pytest.fixture - def project_root(self, tmp_path: Path) -> Path: - """Create a temporary project root with .deepwork directory.""" - deepwork_dir = tmp_path / ".deepwork" - deepwork_dir.mkdir() - (deepwork_dir / "tmp").mkdir() - return tmp_path - - @pytest.fixture - def state_manager(self, project_root: Path) -> StateManager: - """Create a StateManager instance.""" - return StateManager(project_root) - # THIS TEST VALIDATES A HARD REQUIREMENT (JOBS-REQ-003.13.1, JOBS-REQ-003.13.2, JOBS-REQ-003.13.4). # YOU MUST NOT MODIFY THIS TEST UNLESS THE REQUIREMENT CHANGES async def test_nested_workflows_stack(self, state_manager: StateManager) -> None: @@ -468,77 +455,51 @@ async def test_abort_workflow_no_parent(self, state_manager: StateManager) -> No class TestSessionIdRouting: """Tests for session_id-based routing in StateManager.""" - @pytest.fixture - def project_root(self, tmp_path: Path) -> Path: - """Create a temporary project root with .deepwork directory.""" - deepwork_dir = tmp_path / ".deepwork" - deepwork_dir.mkdir() - (deepwork_dir / "tmp").mkdir() - return tmp_path - - @pytest.fixture - def state_manager(self, project_root: Path) -> StateManager: - """Create a StateManager instance.""" - return StateManager(project_root) - # THIS TEST VALIDATES A HARD REQUIREMENT (JOBS-REQ-003.7.1). # YOU MUST NOT MODIFY THIS TEST UNLESS THE REQUIREMENT CHANGES - def test_resolve_session_by_id(self, state_manager: StateManager) -> None: - """Test _resolve_session finds the correct session in a multi-session stack.""" - import asyncio - - async def setup() -> None: - await state_manager.create_session( - job_name="job1", workflow_name="wf1", goal="G1", first_step_id="s1" - ) - await state_manager.create_session( - job_name="job2", workflow_name="wf2", goal="G2", first_step_id="s2" - ) - await state_manager.create_session( - job_name="job3", workflow_name="wf3", goal="G3", first_step_id="s3" - ) - - asyncio.get_event_loop().run_until_complete(setup()) + async def test_resolve_session_by_id(self, state_manager: StateManager) -> None: + """Test resolve_session finds the correct session in a multi-session stack.""" + await state_manager.create_session( + job_name="job1", workflow_name="wf1", goal="G1", first_step_id="s1" + ) + await state_manager.create_session( + job_name="job2", workflow_name="wf2", goal="G2", first_step_id="s2" + ) + await state_manager.create_session( + job_name="job3", workflow_name="wf3", goal="G3", first_step_id="s3" + ) # Stack has 3 sessions; resolve the middle one by ID middle_session = state_manager._session_stack[1] - resolved = state_manager._resolve_session(middle_session.session_id) + resolved = state_manager.resolve_session(middle_session.session_id) assert resolved.session_id == middle_session.session_id assert resolved.job_name == "job2" # THIS TEST VALIDATES A HARD REQUIREMENT (JOBS-REQ-003.7.2). # YOU MUST NOT MODIFY THIS TEST UNLESS THE REQUIREMENT CHANGES - def test_resolve_session_invalid_id(self, state_manager: StateManager) -> None: - """Test _resolve_session raises StateError for unknown session ID.""" - import asyncio - - asyncio.get_event_loop().run_until_complete( - state_manager.create_session( - job_name="job1", workflow_name="wf1", goal="G1", first_step_id="s1" - ) + async def test_resolve_session_invalid_id(self, state_manager: StateManager) -> None: + """Test resolve_session raises StateError for unknown session ID.""" + await state_manager.create_session( + job_name="job1", workflow_name="wf1", goal="G1", first_step_id="s1" ) with pytest.raises(StateError, match="Session 'nonexistent' not found"): - state_manager._resolve_session("nonexistent") + state_manager.resolve_session("nonexistent") # THIS TEST VALIDATES A HARD REQUIREMENT (JOBS-REQ-003.7.3). # YOU MUST NOT MODIFY THIS TEST UNLESS THE REQUIREMENT CHANGES - def test_resolve_session_none_falls_back_to_active(self, state_manager: StateManager) -> None: - """Test _resolve_session with None falls back to top-of-stack.""" - import asyncio - - asyncio.get_event_loop().run_until_complete( - state_manager.create_session( - job_name="job1", workflow_name="wf1", goal="G1", first_step_id="s1" - ) + async def test_resolve_session_none_falls_back_to_active( + self, state_manager: StateManager + ) -> None: + """Test resolve_session with None falls back to top-of-stack.""" + await state_manager.create_session( + job_name="job1", workflow_name="wf1", goal="G1", first_step_id="s1" ) - asyncio.get_event_loop().run_until_complete( - state_manager.create_session( - job_name="job2", workflow_name="wf2", goal="G2", first_step_id="s2" - ) + await state_manager.create_session( + job_name="job2", workflow_name="wf2", goal="G2", first_step_id="s2" ) - resolved = state_manager._resolve_session(None) + resolved = state_manager.resolve_session(None) assert resolved.job_name == "job2" # top-of-stack # THIS TEST VALIDATES A HARD REQUIREMENT (JOBS-REQ-003.7.4, JOBS-REQ-003.11.4, JOBS-REQ-003.13.5). @@ -627,3 +588,153 @@ async def test_complete_step_with_session_id(self, state_manager: StateManager) top = state_manager.get_active_session() assert top is not None assert "s1" not in top.step_progress + + +class TestGoToStep: + """Tests for go_to_step in StateManager.""" + + # THIS TEST VALIDATES A HARD REQUIREMENT (JOBS-REQ-003.19.6). + # YOU MUST NOT MODIFY THIS TEST UNLESS THE REQUIREMENT CHANGES + async def test_go_to_step_clears_invalidated_progress( + self, state_manager: StateManager + ) -> None: + """Test that go_to_step clears step_progress for invalidated steps.""" + await state_manager.create_session( + job_name="test_job", + workflow_name="main", + goal="Test", + first_step_id="step1", + ) + + # Simulate completing step1 and step2 + await state_manager.complete_step("step1", {"out1": "out1.md"}) + await state_manager.complete_step("step2", {"out2": "out2.md"}) + + session = state_manager.get_active_session() + assert session is not None + assert "step1" in session.step_progress + assert "step2" in session.step_progress + + # Go back to step1 — both step1 and step2 should be cleared + await state_manager.go_to_step( + step_id="step1", + entry_index=0, + invalidate_step_ids=["step1", "step2"], + ) + + session = state_manager.get_active_session() + assert session is not None + assert "step1" not in session.step_progress + assert "step2" not in session.step_progress + + # THIS TEST VALIDATES A HARD REQUIREMENT (JOBS-REQ-003.19.7). + # YOU MUST NOT MODIFY THIS TEST UNLESS THE REQUIREMENT CHANGES + async def test_go_to_step_preserves_earlier_progress(self, state_manager: StateManager) -> None: + """Test that go_to_step preserves progress for steps before the target.""" + await state_manager.create_session( + job_name="test_job", + workflow_name="main", + goal="Test", + first_step_id="step1", + ) + + await state_manager.complete_step("step1", {"out1": "out1.md"}) + await state_manager.complete_step("step2", {"out2": "out2.md"}) + await state_manager.complete_step("step3", {"out3": "out3.md"}) + + # Go back to step2 — only step2 and step3 should be cleared + await state_manager.go_to_step( + step_id="step2", + entry_index=1, + invalidate_step_ids=["step2", "step3"], + ) + + session = state_manager.get_active_session() + assert session is not None + assert "step1" in session.step_progress # preserved + assert "step2" not in session.step_progress # cleared + assert "step3" not in session.step_progress # cleared + + # THIS TEST VALIDATES A HARD REQUIREMENT (JOBS-REQ-003.19.8, JOBS-REQ-003.19.9). + # YOU MUST NOT MODIFY THIS TEST UNLESS THE REQUIREMENT CHANGES + async def test_go_to_step_updates_position(self, state_manager: StateManager) -> None: + """Test that go_to_step updates current_step_id and current_entry_index.""" + await state_manager.create_session( + job_name="test_job", + workflow_name="main", + goal="Test", + first_step_id="step1", + ) + + # Advance position + await state_manager.advance_to_step("step3", 2) + + # Go back to step1 + await state_manager.go_to_step( + step_id="step1", + entry_index=0, + invalidate_step_ids=["step1", "step2", "step3"], + ) + + session = state_manager.get_active_session() + assert session is not None + assert session.current_step_id == "step1" + assert session.current_entry_index == 0 + + # THIS TEST VALIDATES A HARD REQUIREMENT (JOBS-REQ-003.19.10). + # YOU MUST NOT MODIFY THIS TEST UNLESS THE REQUIREMENT CHANGES + async def test_go_to_step_persists_to_disk(self, state_manager: StateManager) -> None: + """Test that go_to_step persists changes to the session file.""" + session = await state_manager.create_session( + job_name="test_job", + workflow_name="main", + goal="Test", + first_step_id="step1", + ) + session_id = session.session_id + + await state_manager.complete_step("step1", {"out1": "out1.md"}) + await state_manager.advance_to_step("step2", 1) + + await state_manager.go_to_step( + step_id="step1", + entry_index=0, + invalidate_step_ids=["step1", "step2"], + ) + + # Load from disk with a new manager + new_manager = StateManager(state_manager.project_root) + loaded = await new_manager.load_session(session_id) + + assert loaded.current_step_id == "step1" + assert loaded.current_entry_index == 0 + assert "step1" not in loaded.step_progress + + # THIS TEST VALIDATES A HARD REQUIREMENT (JOBS-REQ-003.19.5). + # YOU MUST NOT MODIFY THIS TEST UNLESS THE REQUIREMENT CHANGES + async def test_go_to_step_with_session_id(self, state_manager: StateManager) -> None: + """Test that go_to_step works with explicit session_id.""" + session1 = await state_manager.create_session( + job_name="job1", workflow_name="wf1", goal="G1", first_step_id="s1" + ) + await state_manager.create_session( + job_name="job2", workflow_name="wf2", goal="G2", first_step_id="s2" + ) + + await state_manager.complete_step("s1", {"out": "out.md"}, session_id=session1.session_id) + + # Go back on session1 using session_id + await state_manager.go_to_step( + step_id="s1", + entry_index=0, + invalidate_step_ids=["s1"], + session_id=session1.session_id, + ) + + assert session1.current_step_id == "s1" + assert "s1" not in session1.step_progress + + # session2 (top) should be unaffected + top = state_manager.get_active_session() + assert top is not None + assert top.current_step_id == "s2" diff --git a/tests/unit/jobs/mcp/test_tools.py b/tests/unit/jobs/mcp/test_tools.py index cc59f927..57b440d3 100644 --- a/tests/unit/jobs/mcp/test_tools.py +++ b/tests/unit/jobs/mcp/test_tools.py @@ -8,6 +8,7 @@ from deepwork.jobs.mcp.schemas import ( AbortWorkflowInput, FinishedStepInput, + GoToStepInput, StartWorkflowInput, StepStatus, ) @@ -346,7 +347,7 @@ async def test_finished_step_no_session(self, tools: WorkflowTools) -> None: """Test finished_step without active session.""" input_data = FinishedStepInput(outputs={"output1.md": "output1.md"}) - with pytest.raises(StateError, match="No active workflow session"): + with pytest.raises(ToolError, match="No active workflow session"): await tools.finished_step(input_data) # THIS TEST VALIDATES A HARD REQUIREMENT (JOBS-REQ-001.4.7, JOBS-REQ-001.4.15, JOBS-REQ-001.4.17). @@ -1873,3 +1874,391 @@ def test_no_quality_gate_no_external_runner_skips_review( ) assert tools.quality_gate is None assert tools.external_runner is None + + +class TestGoToStep: + """Tests for go_to_step tool.""" + + @pytest.fixture + def project_root(self, tmp_path: Path) -> Path: + """Create a temporary project with a 3-step test job.""" + deepwork_dir = tmp_path / ".deepwork" + deepwork_dir.mkdir() + (deepwork_dir / "tmp").mkdir() + jobs_dir = deepwork_dir / "jobs" + jobs_dir.mkdir() + + job_dir = jobs_dir / "three_step_job" + job_dir.mkdir() + + job_yml = """ +name: three_step_job +version: "1.0.0" +summary: A three-step test job +common_job_info_provided_to_all_steps_at_runtime: Test job for go_to_step + +steps: + - id: step1 + name: First Step + description: The first step + instructions_file: steps/step1.md + outputs: + output1.md: + type: file + description: First step output + required: true + reviews: + - run_each: step + quality_criteria: + "Valid": "Is the output valid?" + - id: step2 + name: Second Step + description: The second step + instructions_file: steps/step2.md + outputs: + output2.md: + type: file + description: Second step output + required: true + reviews: [] + - id: step3 + name: Third Step + description: The third step + instructions_file: steps/step3.md + outputs: + output3.md: + type: file + description: Third step output + required: true + reviews: [] + +workflows: + - name: main + summary: Main workflow + steps: + - step1 + - step2 + - step3 +""" + (job_dir / "job.yml").write_text(job_yml) + + steps_dir = job_dir / "steps" + steps_dir.mkdir() + (steps_dir / "step1.md").write_text("# Step 1\n\nDo the first thing.") + (steps_dir / "step2.md").write_text("# Step 2\n\nDo the second thing.") + (steps_dir / "step3.md").write_text("# Step 3\n\nDo the third thing.") + + return tmp_path + + @pytest.fixture + def state_manager(self, project_root: Path) -> StateManager: + return StateManager(project_root) + + @pytest.fixture + def tools(self, project_root: Path, state_manager: StateManager) -> WorkflowTools: + return WorkflowTools(project_root=project_root, state_manager=state_manager) + + async def _start_and_advance_to_step3(self, tools: WorkflowTools, project_root: Path) -> str: + """Helper: start workflow and advance to step3, returning session_id.""" + resp = await tools.start_workflow( + StartWorkflowInput( + goal="Test go_to_step", + job_name="three_step_job", + workflow_name="main", + ) + ) + session_id = resp.begin_step.session_id + + # Complete step1 + (project_root / "output1.md").write_text("Step 1 output") + await tools.finished_step(FinishedStepInput(outputs={"output1.md": "output1.md"})) + + # Complete step2 + (project_root / "output2.md").write_text("Step 2 output") + await tools.finished_step(FinishedStepInput(outputs={"output2.md": "output2.md"})) + + return session_id + + # THIS TEST VALIDATES A HARD REQUIREMENT (JOBS-REQ-001.7.13). + # YOU MUST NOT MODIFY THIS TEST UNLESS THE REQUIREMENT CHANGES + async def test_go_back_to_prior_step(self, tools: WorkflowTools, project_root: Path) -> None: + """Test navigating back to a prior step returns step info.""" + await self._start_and_advance_to_step3(tools, project_root) + + response = await tools.go_to_step(GoToStepInput(step_id="step1")) + + assert response.begin_step.step_id == "step1" + assert "Step 1" in response.begin_step.step_instructions + assert len(response.begin_step.step_expected_outputs) == 1 + assert response.begin_step.step_expected_outputs[0].name == "output1.md" + + # THIS TEST VALIDATES A HARD REQUIREMENT (JOBS-REQ-001.7.9, JOBS-REQ-001.7.14). + # YOU MUST NOT MODIFY THIS TEST UNLESS THE REQUIREMENT CHANGES + async def test_go_back_clears_subsequent_progress( + self, tools: WorkflowTools, project_root: Path + ) -> None: + """Test that going back clears progress for target step and all subsequent.""" + await self._start_and_advance_to_step3(tools, project_root) + + response = await tools.go_to_step(GoToStepInput(step_id="step2")) + + # step2 and step3 should be invalidated + assert "step2" in response.invalidated_steps + assert "step3" in response.invalidated_steps + # step1 should NOT be invalidated + assert "step1" not in response.invalidated_steps + + # Verify session state: step1 progress preserved, step3 cleared + # step2 has fresh progress from start_step (started_at set, no completed_at) + session = tools.state_manager.get_active_session() + assert session is not None + assert "step1" in session.step_progress + assert session.step_progress["step1"].completed_at is not None # preserved + assert "step2" in session.step_progress # re-created by start_step + assert session.step_progress["step2"].completed_at is None # fresh, not completed + assert session.step_progress["step2"].outputs == {} # no outputs yet + assert "step3" not in session.step_progress # cleared + + # THIS TEST VALIDATES A HARD REQUIREMENT (JOBS-REQ-001.7.8). + # YOU MUST NOT MODIFY THIS TEST UNLESS THE REQUIREMENT CHANGES + async def test_restart_current_step(self, tools: WorkflowTools, project_root: Path) -> None: + """Test going to the current step restarts it.""" + await self._start_and_advance_to_step3(tools, project_root) + + # Currently at step3 (entry_index=2), go_to_step("step3") should work + response = await tools.go_to_step(GoToStepInput(step_id="step3")) + + assert response.begin_step.step_id == "step3" + assert "step3" in response.invalidated_steps + # step1 and step2 should be preserved + assert "step1" not in response.invalidated_steps + assert "step2" not in response.invalidated_steps + + # THIS TEST VALIDATES A HARD REQUIREMENT (JOBS-REQ-001.7.6). + # YOU MUST NOT MODIFY THIS TEST UNLESS THE REQUIREMENT CHANGES + async def test_invalid_step_id_error(self, tools: WorkflowTools, project_root: Path) -> None: + """Test that an invalid step_id raises ToolError.""" + await self._start_and_advance_to_step3(tools, project_root) + + with pytest.raises(ToolError, match="Step 'nonexistent' not found in workflow"): + await tools.go_to_step(GoToStepInput(step_id="nonexistent")) + + # THIS TEST VALIDATES A HARD REQUIREMENT (JOBS-REQ-001.7.7). + # YOU MUST NOT MODIFY THIS TEST UNLESS THE REQUIREMENT CHANGES + async def test_forward_navigation_error(self, tools: WorkflowTools, project_root: Path) -> None: + """Test that going forward raises ToolError.""" + # Start workflow — currently at step1 (entry_index=0) + await tools.start_workflow( + StartWorkflowInput( + goal="Test", + job_name="three_step_job", + workflow_name="main", + ) + ) + + with pytest.raises(ToolError, match="Cannot go forward"): + await tools.go_to_step(GoToStepInput(step_id="step2")) + + # THIS TEST VALIDATES A HARD REQUIREMENT (JOBS-REQ-001.7.4). + # YOU MUST NOT MODIFY THIS TEST UNLESS THE REQUIREMENT CHANGES + async def test_no_session_error(self, tools: WorkflowTools) -> None: + """Test that go_to_step with no active session raises StateError.""" + with pytest.raises(StateError, match="No active workflow session"): + await tools.go_to_step(GoToStepInput(step_id="step1")) + + # THIS TEST VALIDATES A HARD REQUIREMENT (JOBS-REQ-001.7.13). + # YOU MUST NOT MODIFY THIS TEST UNLESS THE REQUIREMENT CHANGES + async def test_step_reviews_included_in_response( + self, tools: WorkflowTools, project_root: Path + ) -> None: + """Test that reviews are included when going back to a step with reviews.""" + await self._start_and_advance_to_step3(tools, project_root) + + # step1 has reviews defined + response = await tools.go_to_step(GoToStepInput(step_id="step1")) + + assert len(response.begin_step.step_reviews) == 1 + assert response.begin_step.step_reviews[0].run_each == "step" + assert "Valid" in response.begin_step.step_reviews[0].quality_criteria + + # THIS TEST VALIDATES A HARD REQUIREMENT (JOBS-REQ-001.7.15). + # YOU MUST NOT MODIFY THIS TEST UNLESS THE REQUIREMENT CHANGES + async def test_stack_included_in_response( + self, tools: WorkflowTools, project_root: Path + ) -> None: + """Test that the workflow stack is included in the response.""" + await self._start_and_advance_to_step3(tools, project_root) + + response = await tools.go_to_step(GoToStepInput(step_id="step1")) + + assert len(response.stack) == 1 + assert response.stack[0].workflow == "three_step_job/main" + assert response.stack[0].step == "step1" + + # THIS TEST VALIDATES A HARD REQUIREMENT (JOBS-REQ-001.7.12). + # YOU MUST NOT MODIFY THIS TEST UNLESS THE REQUIREMENT CHANGES + async def test_go_to_step_then_finish_step_advances( + self, tools: WorkflowTools, project_root: Path + ) -> None: + """Test that after go_to_step, finishing the step advances normally.""" + await self._start_and_advance_to_step3(tools, project_root) + + # Go back to step1 + await tools.go_to_step(GoToStepInput(step_id="step1")) + + # Finish step1 again — should advance to step2 + (project_root / "output1.md").write_text("Revised step 1 output") + response = await tools.finished_step( + FinishedStepInput(outputs={"output1.md": "output1.md"}) + ) + + assert response.status == StepStatus.NEXT_STEP + assert response.begin_step is not None + assert response.begin_step.step_id == "step2" + + # THIS TEST VALIDATES A HARD REQUIREMENT (JOBS-REQ-001.7.5). + # YOU MUST NOT MODIFY THIS TEST UNLESS THE REQUIREMENT CHANGES + async def test_go_to_step_with_session_id( + self, tools: WorkflowTools, project_root: Path + ) -> None: + """Test that go_to_step targets a specific session when session_id is provided.""" + # Start first workflow and advance + session_id = await self._start_and_advance_to_step3(tools, project_root) + + # Start a second (nested) workflow — this becomes top-of-stack + await tools.start_workflow( + StartWorkflowInput( + goal="Nested", + job_name="three_step_job", + workflow_name="main", + ) + ) + + # go_to_step targeting the first session by session_id + response = await tools.go_to_step(GoToStepInput(step_id="step1", session_id=session_id)) + + # Should navigate the first session, not the top-of-stack + assert response.begin_step.step_id == "step1" + assert response.begin_step.session_id == session_id + + # THIS TEST VALIDATES A HARD REQUIREMENT (JOBS-REQ-001.7.10). + # YOU MUST NOT MODIFY THIS TEST UNLESS THE REQUIREMENT CHANGES + async def test_go_to_step_preserves_files_on_disk( + self, tools: WorkflowTools, project_root: Path + ) -> None: + """Test that go_to_step does not delete files on disk.""" + await self._start_and_advance_to_step3(tools, project_root) + + # Verify files exist before go_to_step + assert (project_root / "output1.md").exists() + assert (project_root / "output2.md").exists() + + # Go back to step1 — should clear session state but NOT delete files + await tools.go_to_step(GoToStepInput(step_id="step1")) + + # Files must still exist on disk + assert (project_root / "output1.md").exists() + assert (project_root / "output2.md").exists() + + # THIS TEST VALIDATES A HARD REQUIREMENT (JOBS-REQ-001.7.11). + # YOU MUST NOT MODIFY THIS TEST UNLESS THE REQUIREMENT CHANGES + async def test_go_to_step_concurrent_entry(self, tmp_path: Path) -> None: + """Test that go_to_step on a concurrent entry navigates to the first step.""" + # Set up a job with a concurrent step entry + deepwork_dir = tmp_path / ".deepwork" + deepwork_dir.mkdir() + (deepwork_dir / "tmp").mkdir() + jobs_dir = deepwork_dir / "jobs" + jobs_dir.mkdir() + job_dir = jobs_dir / "concurrent_job" + job_dir.mkdir() + + job_yml = """ +name: concurrent_job +version: "1.0.0" +summary: Job with concurrent steps +common_job_info_provided_to_all_steps_at_runtime: Test + +steps: + - id: setup + name: Setup + description: Setup step + instructions_file: steps/setup.md + outputs: + setup.md: + type: file + description: Setup output + required: true + reviews: [] + - id: task_a + name: Task A + description: Concurrent task A + instructions_file: steps/task_a.md + outputs: + task_a.md: + type: file + description: Task A output + required: true + reviews: [] + - id: task_b + name: Task B + description: Concurrent task B + instructions_file: steps/task_b.md + outputs: + task_b.md: + type: file + description: Task B output + required: true + reviews: [] + - id: finalize + name: Finalize + description: Final step + instructions_file: steps/finalize.md + outputs: + final.md: + type: file + description: Final output + required: true + reviews: [] + +workflows: + - name: main + summary: Main workflow + steps: + - setup + - [task_a, task_b] + - finalize +""" + (job_dir / "job.yml").write_text(job_yml) + steps_dir = job_dir / "steps" + steps_dir.mkdir() + (steps_dir / "setup.md").write_text("# Setup\n\nDo setup.") + (steps_dir / "task_a.md").write_text("# Task A\n\nDo task A.") + (steps_dir / "task_b.md").write_text("# Task B\n\nDo task B.") + (steps_dir / "finalize.md").write_text("# Finalize\n\nFinalize.") + + state_manager = StateManager(tmp_path) + tools = WorkflowTools(project_root=tmp_path, state_manager=state_manager) + + # Start workflow and advance past the concurrent entry to finalize + await tools.start_workflow( + StartWorkflowInput(goal="Test", job_name="concurrent_job", workflow_name="main") + ) + (tmp_path / "setup.md").write_text("Setup done") + await tools.finished_step(FinishedStepInput(outputs={"setup.md": "setup.md"})) + # Now at the concurrent entry [task_a, task_b] — current step is task_a + (tmp_path / "task_a.md").write_text("Task A done") + (tmp_path / "task_b.md").write_text("Task B done") + await tools.finished_step(FinishedStepInput(outputs={"task_a.md": "task_a.md"})) + # Now at finalize (entry_index=2) + + # Go back to the concurrent entry — should navigate to task_a (first in entry) + response = await tools.go_to_step(GoToStepInput(step_id="task_a")) + + assert response.begin_step.step_id == "task_a" + # Both task_a, task_b, and finalize should be invalidated + assert "task_a" in response.invalidated_steps + assert "task_b" in response.invalidated_steps + assert "finalize" in response.invalidated_steps + # setup should NOT be invalidated + assert "setup" not in response.invalidated_steps