-
Notifications
You must be signed in to change notification settings - Fork 347
docs: add pre-step data-fetching pattern to create-agentic-workflow.md #24685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
5c01816
3cccd00
41f63b5
c1a30ba
e012914
0b4ff6a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -705,6 +705,63 @@ When creating workflows that involve coding agents operating in large repositori | |||||
| - Documentation updates for multiple services | ||||||
| - Dependency updates across microservices | ||||||
|
|
||||||
| ### Pre-step Data Fetching | ||||||
|
|
||||||
| Use a deterministic `steps:` block to download, trim, and store heavy data before the agent runs. The agent reads local files instead of making repeated API calls, staying within its token budget. | ||||||
|
|
||||||
| **Rules:** | ||||||
| - Always set `env: GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}` on every step that calls `gh` — the token is not injected automatically. | ||||||
| - Write output to `/tmp/gh-aw/agent/` (canonical agent data directory). | ||||||
| - Trim large blobs before writing (`tail -N`). | ||||||
| - Add `permissions: actions: read` when reading workflow logs or artifacts. | ||||||
| - Use `jq` to filter JSON responses before writing them to disk — extract only the fields the agent needs and keep file sizes small. | ||||||
|
|
||||||
| **Template (CI log analysis):** | ||||||
|
|
||||||
| ```yaml | ||||||
| --- | ||||||
| on: | ||||||
| workflow_run: | ||||||
| workflows: ["CI"] | ||||||
| types: [completed] | ||||||
| permissions: | ||||||
| contents: read | ||||||
| actions: read # required for gh run view / gh run download | ||||||
| tools: | ||||||
| github: | ||||||
| toolsets: [default] | ||||||
| cache-memory: true # persist pre-fetched data across runs (dedup, trending) | ||||||
| steps: | ||||||
| - name: Fetch CI logs | ||||||
| env: | ||||||
| GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} | ||||||
| RUN_ID: ${{ github.event.workflow_run.id }} | ||||||
| run: | | ||||||
| mkdir -p /tmp/gh-aw/agent | ||||||
| gh run view "$RUN_ID" --log > /tmp/gh-aw/agent/ci-logs.txt 2>&1 || true | ||||||
| tail -500 /tmp/gh-aw/agent/ci-logs.txt > /tmp/gh-aw/agent/ci-logs-trimmed.txt | ||||||
| safe-outputs: | ||||||
| add-comment: | ||||||
| max: 1 | ||||||
| --- | ||||||
|
|
||||||
| Analyze `/tmp/gh-aw/agent/ci-logs-trimmed.txt`. Identify the root cause and post a comment to the triggering PR. | ||||||
|
|
||||||
| Check `/tmp/gh-aw/cache-memory/seen-runs.json` for previously seen run IDs; skip if already processed and append the current run ID when done. | ||||||
| ``` | ||||||
|
|
||||||
| **Use cases:** | ||||||
|
|
||||||
| | Scenario | Step snippet | | ||||||
| |---|---| | ||||||
| | Deployment logs (Heroku/Vercel/Railway) | `heroku logs --tail --num 200 --app ${{ vars.HEROKU_APP }} > /tmp/gh-aw/agent/deploy-logs.txt` | | ||||||
|
||||||
| | Deployment logs (Heroku/Vercel/Railway) | `heroku logs --tail --num 200 --app ${{ vars.HEROKU_APP }} > /tmp/gh-aw/agent/deploy-logs.txt` | | |
| | Deployment logs (Heroku/Vercel/Railway) | `heroku logs --num 200 --app ${{ vars.HEROKU_APP }} > /tmp/gh-aw/agent/deploy-logs.txt` | |
Copilot
AI
Apr 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The build/test snippet pipes npm ci into tail, which means the pipeline exit status will come from tail (so npm ci failures may be ignored unless pipefail is set). Consider adjusting the example so failures are handled intentionally (e.g., enable set -o pipefail, or avoid masking the command’s exit code while still trimming output).
| | Build / test output | `npm ci 2>&1 \| tail -200 > /tmp/gh-aw/agent/build.txt && npm run test -- --reporter=json > /tmp/gh-aw/agent/test.json 2>&1 \|\| true` | | |
| | Build / test output | `set -o pipefail && npm ci 2>&1 \| tail -200 > /tmp/gh-aw/agent/build.txt && npm run test -- --reporter=json > /tmp/gh-aw/agent/test.json 2>&1 \|\| true` | |
Copilot
AI
Apr 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The jq one-liner assumes /tmp/gh-aw/cache-memory/seen-runs.json already exists and contains valid JSON; on a first run it likely won’t, so the example will fail. Consider showing an initialization/fallback ([] if missing) and, if you want to guarantee deduplication, ensure the updated array is unique.
| **`cache-memory` tip:** Add `cache-memory: true` under `tools:` to persist pre-fetched data across runs. This enables deduplication (skip already-diagnosed run IDs), trending (compare metrics over time), and avoids redundant downloads on retries. The agent reads and writes `/tmp/gh-aw/cache-memory/`. Use `jq` to update the dedup file efficiently — for example `jq '. + ["'"$RUN_ID"'"]' /tmp/gh-aw/cache-memory/seen-runs.json > /tmp/seen-runs.tmp && mv /tmp/seen-runs.tmp /tmp/gh-aw/cache-memory/seen-runs.json`. See `.github/aw/memory.md` for full configuration options. | |
| **`cache-memory` tip:** Add `cache-memory: true` under `tools:` to persist pre-fetched data across runs. This enables deduplication (skip already-diagnosed run IDs), trending (compare metrics over time), and avoids redundant downloads on retries. The agent reads and writes `/tmp/gh-aw/cache-memory/`. Use `jq` to update the dedup file efficiently — for example `jq -n --arg run_id "$RUN_ID" '((input // []) + [$run_id]) | unique' < <(cat /tmp/gh-aw/cache-memory/seen-runs.json 2>/dev/null || printf '[]') > /tmp/seen-runs.tmp && mv /tmp/seen-runs.tmp /tmp/gh-aw/cache-memory/seen-runs.json`. See `.github/aw/memory.md` for full configuration options. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -39,28 +39,7 @@ | |
| "id": 5, | ||
| "size": 1, | ||
| "stage": "error", | ||
| "template": [ | ||
| "stage=error", | ||
| "reason=The", | ||
| "Serena", | ||
| "MCP", | ||
| "server", | ||
| "is", | ||
| "not", | ||
| "available", | ||
| "in", | ||
| "this", | ||
| "environment.", | ||
| "No", | ||
| "serena-*", | ||
| "tools", | ||
| "are", | ||
| "registered.", | ||
| "tool=Serena", | ||
| "MCP", | ||
| "server", | ||
| "type=missing_tool" | ||
| ] | ||
| "template": ["stage=error", "reason=The", "Serena", "MCP", "server", "is", "not", "available", "in", "this", "environment.", "No", "serena-*", "tools", "are", "registered.", "tool=Serena", "MCP", "server", "type=missing_tool"] | ||
| }, | ||
|
Comment on lines
39
to
43
|
||
| { | ||
| "id": 6, | ||
|
|
@@ -234,12 +213,7 @@ | |
| ], | ||
| "config": { | ||
| "Depth": 4, | ||
| "ExcludeFields": [ | ||
| "session_id", | ||
| "trace_id", | ||
| "span_id", | ||
| "timestamp" | ||
| ], | ||
| "ExcludeFields": ["session_id", "trace_id", "span_id", "timestamp"], | ||
| "MaskRules": [ | ||
| { | ||
| "Name": "uuid", | ||
|
|
@@ -285,21 +259,12 @@ | |
| "id": 1, | ||
| "size": 100, | ||
| "stage": "finish", | ||
| "template": [ | ||
| "stage=finish", | ||
| "\u003c*\u003e", | ||
| "tokens=\u003cNUM\u003e" | ||
| ] | ||
| "template": ["stage=finish", "\u003c*\u003e", "tokens=\u003cNUM\u003e"] | ||
| } | ||
| ], | ||
| "config": { | ||
| "Depth": 4, | ||
| "ExcludeFields": [ | ||
| "session_id", | ||
| "trace_id", | ||
| "span_id", | ||
| "timestamp" | ||
| ], | ||
| "ExcludeFields": ["session_id", "trace_id", "span_id", "timestamp"], | ||
| "MaskRules": [ | ||
| { | ||
| "Name": "uuid", | ||
|
|
@@ -345,21 +310,12 @@ | |
| "id": 1, | ||
| "size": 72, | ||
| "stage": "plan", | ||
| "template": [ | ||
| "stage=plan", | ||
| "errors=\u003cNUM\u003e", | ||
| "turns=\u003cNUM\u003e" | ||
| ] | ||
| "template": ["stage=plan", "errors=\u003cNUM\u003e", "turns=\u003cNUM\u003e"] | ||
| } | ||
| ], | ||
| "config": { | ||
| "Depth": 4, | ||
| "ExcludeFields": [ | ||
| "session_id", | ||
| "trace_id", | ||
| "span_id", | ||
| "timestamp" | ||
| ], | ||
| "ExcludeFields": ["session_id", "trace_id", "span_id", "timestamp"], | ||
| "MaskRules": [ | ||
| { | ||
| "Name": "uuid", | ||
|
|
@@ -403,12 +359,7 @@ | |
| "clusters": null, | ||
| "config": { | ||
| "Depth": 4, | ||
| "ExcludeFields": [ | ||
| "session_id", | ||
| "trace_id", | ||
| "span_id", | ||
| "timestamp" | ||
| ], | ||
| "ExcludeFields": ["session_id", "trace_id", "span_id", "timestamp"], | ||
| "MaskRules": [ | ||
| { | ||
| "Name": "uuid", | ||
|
|
@@ -452,12 +403,7 @@ | |
| "clusters": null, | ||
| "config": { | ||
| "Depth": 4, | ||
| "ExcludeFields": [ | ||
| "session_id", | ||
| "trace_id", | ||
| "span_id", | ||
| "timestamp" | ||
| ], | ||
| "ExcludeFields": ["session_id", "trace_id", "span_id", "timestamp"], | ||
| "MaskRules": [ | ||
| { | ||
| "Name": "uuid", | ||
|
|
@@ -1607,12 +1553,7 @@ | |
| ], | ||
| "config": { | ||
| "Depth": 4, | ||
| "ExcludeFields": [ | ||
| "session_id", | ||
| "trace_id", | ||
| "span_id", | ||
| "timestamp" | ||
| ], | ||
| "ExcludeFields": ["session_id", "trace_id", "span_id", "timestamp"], | ||
| "MaskRules": [ | ||
| { | ||
| "Name": "uuid", | ||
|
|
@@ -1652,4 +1593,4 @@ | |
| }, | ||
| "next_id": 15 | ||
| } | ||
| } | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The template prompt says to “post a comment to the triggering PR”, but
workflow_runcan be triggered by non-PR runs (e.g., pushes) where there is no associated PR to comment on. Consider updating the instructions to handle the “no PR” case explicitly (e.g., noop, or comment on the run/commit instead).