docs: add pre-step data-fetching pattern to create-agentic-workflow.md#24685
docs: add pre-step data-fetching pattern to create-agentic-workflow.md#24685
Conversation
…low.md Agent-Logs-Url: https://github.com/github/gh-aw/sessions/5f9dc564-23a1-4366-b8fc-52c7e44673d1 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
| github: | ||
| toolsets: [default] | ||
| steps: | ||
| - name: Fetch data |
There was a problem hiding this comment.
@copilot use gh_token and set proper permissions to avoid rate limits
There was a problem hiding this comment.
Done in 41f63b5. All step snippets now include env: GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} and the template already has permissions: actions: read. I also moved ${{ github.event.workflow_run.id }} into a RUN_ID env var (following the pattern in ci-doctor.md) so the shell command uses "$RUN_ID" instead of raw expression interpolation.
…ippets Agent-Logs-Url: https://github.com/github/gh-aw/sessions/e4abe7a1-8d34-48e8-b4ff-6bfe4ec72dac Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
…add cache-memory tip Agent-Logs-Url: https://github.com/github/gh-aw/sessions/b3d19cbb-dbdf-4c96-bce7-12da512c46d8 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Done in c1a30ba. The section is now significantly shorter and structured for agent scanning:
|
|
Hey One thing to address before this is ready for review:
Also worth noting: the PR is still in draft state — once the scope is tightened, flip it to ready-for-review. If you'd like a hand, you can assign this prompt to your coding agent:
|
|
@copilot suggest using jq for efficient filtering |
Agent-Logs-Url: https://github.com/github/gh-aw/sessions/9e2d3ec1-d10f-4c63-b6d5-a82bf271a007 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Done in e012914. Three places now reference
|
|
@copilot the tools.agentic-workflows enables logs and audit commands designed to analyze agentic workflow runs. |
…table Agent-Logs-Url: https://github.com/github/gh-aw/sessions/ad8bf1ae-1a34-4c0a-b6a2-f6a24784bdb2 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Done in 0b4ff6a. Added a row to the use-cases table: "Agentic workflow run logs — No shell step needed — add |
There was a problem hiding this comment.
Pull request overview
Adds documentation describing a “pre-step data fetching” pattern for agentic workflows so heavy/verbose data is downloaded and trimmed before the agent runs (reducing token usage and repeated API calls).
Changes:
- Adds a new “Pre-step Data Fetching” best-practices subsection with rules, a template, use-case table, and
cache-memorydedup tips. - Updates
pkg/agentdrain/data/default_weights.jsonformatting/content (not described in the PR summary).
Show a summary per file
| File | Description |
|---|---|
.github/aw/create-agentic-workflow.md |
Documents a pre-step steps: pattern (download/trim/store) with examples and cache-memory usage. |
pkg/agentdrain/data/default_weights.json |
Updates embedded default weights JSON (formatting/content changes). |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 2/2 changed files
- Comments generated: 5
| "id": 5, | ||
| "size": 1, | ||
| "stage": "error", | ||
| "template": [ | ||
| "stage=error", | ||
| "reason=The", | ||
| "Serena", | ||
| "MCP", | ||
| "server", | ||
| "is", | ||
| "not", | ||
| "available", | ||
| "in", | ||
| "this", | ||
| "environment.", | ||
| "No", | ||
| "serena-*", | ||
| "tools", | ||
| "are", | ||
| "registered.", | ||
| "tool=Serena", | ||
| "MCP", | ||
| "server", | ||
| "type=missing_tool" | ||
| ] | ||
| "template": ["stage=error", "reason=The", "Serena", "MCP", "server", "is", "not", "available", "in", "this", "environment.", "No", "serena-*", "tools", "are", "registered.", "tool=Serena", "MCP", "server", "type=missing_tool"] | ||
| }, |
There was a problem hiding this comment.
The PR description focuses on documentation changes, but this file is also modified (formatting/serialization changes). Please either mention this change in the PR description or split it into a separate PR so reviewers understand why default weights data is being updated.
| max: 1 | ||
| --- | ||
|
|
||
| Analyze `/tmp/gh-aw/agent/ci-logs-trimmed.txt`. Identify the root cause and post a comment to the triggering PR. |
There was a problem hiding this comment.
The template prompt says to “post a comment to the triggering PR”, but workflow_run can be triggered by non-PR runs (e.g., pushes) where there is no associated PR to comment on. Consider updating the instructions to handle the “no PR” case explicitly (e.g., noop, or comment on the run/commit instead).
| Analyze `/tmp/gh-aw/agent/ci-logs-trimmed.txt`. Identify the root cause and post a comment to the triggering PR. | |
| Analyze `/tmp/gh-aw/agent/ci-logs-trimmed.txt`. Identify the root cause and determine whether the triggering `workflow_run` is associated with a pull request. If it is, post a comment to that PR. If no PR is associated with the run, skip commenting. |
|
|
||
| | Scenario | Step snippet | | ||
| |---|---| | ||
| | Deployment logs (Heroku/Vercel/Railway) | `heroku logs --tail --num 200 --app ${{ vars.HEROKU_APP }} > /tmp/gh-aw/agent/deploy-logs.txt` | |
There was a problem hiding this comment.
The Heroku example uses heroku logs --tail ..., which will continue streaming and can cause the pre-step to hang until the job times out. For a bounded, deterministic pre-fetch, use a non-tailing invocation that exits after a fixed number of lines (or add an explicit timeout).
| | Deployment logs (Heroku/Vercel/Railway) | `heroku logs --tail --num 200 --app ${{ vars.HEROKU_APP }} > /tmp/gh-aw/agent/deploy-logs.txt` | | |
| | Deployment logs (Heroku/Vercel/Railway) | `heroku logs --num 200 --app ${{ vars.HEROKU_APP }} > /tmp/gh-aw/agent/deploy-logs.txt` | |
| | Scenario | Step snippet | | ||
| |---|---| | ||
| | Deployment logs (Heroku/Vercel/Railway) | `heroku logs --tail --num 200 --app ${{ vars.HEROKU_APP }} > /tmp/gh-aw/agent/deploy-logs.txt` | | ||
| | Build / test output | `npm ci 2>&1 \| tail -200 > /tmp/gh-aw/agent/build.txt && npm run test -- --reporter=json > /tmp/gh-aw/agent/test.json 2>&1 \|\| true` | |
There was a problem hiding this comment.
The build/test snippet pipes npm ci into tail, which means the pipeline exit status will come from tail (so npm ci failures may be ignored unless pipefail is set). Consider adjusting the example so failures are handled intentionally (e.g., enable set -o pipefail, or avoid masking the command’s exit code while still trimming output).
| | Build / test output | `npm ci 2>&1 \| tail -200 > /tmp/gh-aw/agent/build.txt && npm run test -- --reporter=json > /tmp/gh-aw/agent/test.json 2>&1 \|\| true` | | |
| | Build / test output | `set -o pipefail && npm ci 2>&1 \| tail -200 > /tmp/gh-aw/agent/build.txt && npm run test -- --reporter=json > /tmp/gh-aw/agent/test.json 2>&1 \|\| true` | |
| | Filter JSON API response | `gh api repos/{owner}/{repo}/issues --jq '[.[] \| {number,title,state,labels:[.labels[].name]}]' > /tmp/gh-aw/agent/issues.json` | | ||
| | Agentic workflow run logs | No shell step needed — add `tools: agentic-workflows:` and the agent uses `logs` and `audit` commands directly | | ||
|
|
||
| **`cache-memory` tip:** Add `cache-memory: true` under `tools:` to persist pre-fetched data across runs. This enables deduplication (skip already-diagnosed run IDs), trending (compare metrics over time), and avoids redundant downloads on retries. The agent reads and writes `/tmp/gh-aw/cache-memory/`. Use `jq` to update the dedup file efficiently — for example `jq '. + ["'"$RUN_ID"'"]' /tmp/gh-aw/cache-memory/seen-runs.json > /tmp/seen-runs.tmp && mv /tmp/seen-runs.tmp /tmp/gh-aw/cache-memory/seen-runs.json`. See `.github/aw/memory.md` for full configuration options. |
There was a problem hiding this comment.
The jq one-liner assumes /tmp/gh-aw/cache-memory/seen-runs.json already exists and contains valid JSON; on a first run it likely won’t, so the example will fail. Consider showing an initialization/fallback ([] if missing) and, if you want to guarantee deduplication, ensure the updated array is unique.
| **`cache-memory` tip:** Add `cache-memory: true` under `tools:` to persist pre-fetched data across runs. This enables deduplication (skip already-diagnosed run IDs), trending (compare metrics over time), and avoids redundant downloads on retries. The agent reads and writes `/tmp/gh-aw/cache-memory/`. Use `jq` to update the dedup file efficiently — for example `jq '. + ["'"$RUN_ID"'"]' /tmp/gh-aw/cache-memory/seen-runs.json > /tmp/seen-runs.tmp && mv /tmp/seen-runs.tmp /tmp/gh-aw/cache-memory/seen-runs.json`. See `.github/aw/memory.md` for full configuration options. | |
| **`cache-memory` tip:** Add `cache-memory: true` under `tools:` to persist pre-fetched data across runs. This enables deduplication (skip already-diagnosed run IDs), trending (compare metrics over time), and avoids redundant downloads on retries. The agent reads and writes `/tmp/gh-aw/cache-memory/`. Use `jq` to update the dedup file efficiently — for example `jq -n --arg run_id "$RUN_ID" '((input // []) + [$run_id]) | unique' < <(cat /tmp/gh-aw/cache-memory/seen-runs.json 2>/dev/null || printf '[]') > /tmp/seen-runs.tmp && mv /tmp/seen-runs.tmp /tmp/gh-aw/cache-memory/seen-runs.json`. See `.github/aw/memory.md` for full configuration options. |
3 of 8 agent persona scenarios independently recommended downloading heavy data before the AI session — but this pattern wasn't documented anywhere, so users discovered it by accident.
Changes
### Pre-step Data Fetchingsubsection under## Best Practicesin.github/aw/create-agentic-workflow.mdGH_TOKENauthentication, output directory, trimming, required permissions, andjqfiltering (extract only the fields the agent needs before writing to disk)steps:+run:that downloads, trims, and writes to/tmp/gh-aw/agent/before the agent reads it; includescache-memory: trueand a sample prompt instruction showing run-ID deduplicationnpm ci && npm run test), GitHub Actions run artifacts (gh run download),jq-filtered JSON API responses (gh api … --jq '…'), and agentic workflow run logs (usingtools: agentic-workflows:for nativelogsandauditcommands — no shell step needed)cache-memorytip: explains deduplication (skip already-diagnosed run IDs), trending (compare metrics over time), and retry-avoidance; includes ajqone-liner for atomically updating the dedup file, with a link tomemory.mdfor full configuration optionsenv: GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}and use aRUN_IDenv var instead of inline expression interpolation; template includespermissions: actions: readTemplate snippet:
The agent reads
/tmp/gh-aw/agent/ci-logs-trimmed.txtand checks/tmp/gh-aw/cache-memory/seen-runs.jsonfor deduplication, skipping redundant tool calls. Usejqto filter JSON API responses before saving, keeping file sizes small and agent context focused. For analyzing agentic workflow runs specifically, addtools: agentic-workflows:so the agent can use thelogsandauditcommands directly without a shell pre-step.