docs: add pre-step data-fetching pattern to create-agentic-workflow.md by Copilot · Pull Request #24685 · github/gh-aw

Copilot · 2026-04-05T08:36:51Z

3 of 8 agent persona scenarios independently recommended downloading heavy data before the AI session — but this pattern wasn't documented anywhere, so users discovered it by accident.

Changes

New ### Pre-step Data Fetching subsection under ## Best Practices in .github/aw/create-agentic-workflow.md
- Concise Rules list covering GH_TOKEN authentication, output directory, trimming, required permissions, and jq filtering (extract only the fields the agent needs before writing to disk)
- Self-contained YAML template with steps: + run: that downloads, trims, and writes to /tmp/gh-aw/agent/ before the agent reads it; includes cache-memory: true and a sample prompt instruction showing run-ID deduplication
- Quick-reference table of use cases: deployment failure logs (Heroku/Vercel/Railway), build artifacts / test results (npm ci && npm run test), GitHub Actions run artifacts (gh run download), jq-filtered JSON API responses (gh api … --jq '…'), and agentic workflow run logs (using tools: agentic-workflows: for native logs and audit commands — no shell step needed)
- cache-memory tip: explains deduplication (skip already-diagnosed run IDs), trending (compare metrics over time), and retry-avoidance; includes a jq one-liner for atomically updating the dedup file, with a link to memory.md for full configuration options
- All step snippets set env: GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} and use a RUN_ID env var instead of inline expression interpolation; template includes permissions: actions: read

Template snippet:

permissions:
  contents: read
  actions: read          # required for gh run view / gh run download
tools:
  github:
    toolsets: [default]
  cache-memory: true     # persist pre-fetched data across runs (dedup, trending)
steps:
  - name: Fetch CI logs
    env:
      GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
      RUN_ID: ${{ github.event.workflow_run.id }}
    run: |
      mkdir -p /tmp/gh-aw/agent
      gh run view "$RUN_ID" --log > /tmp/gh-aw/agent/ci-logs.txt 2>&1 || true
      tail -500 /tmp/gh-aw/agent/ci-logs.txt > /tmp/gh-aw/agent/ci-logs-trimmed.txt

The agent reads /tmp/gh-aw/agent/ci-logs-trimmed.txt and checks /tmp/gh-aw/cache-memory/seen-runs.json for deduplication, skipping redundant tool calls. Use jq to filter JSON API responses before saving, keeping file sizes small and agent context focused. For analyzing agentic workflow runs specifically, add tools: agentic-workflows: so the agent can use the logs and audit commands directly without a shell pre-step.

…low.md Agent-Logs-Url: https://github.com/github/gh-aw/sessions/5f9dc564-23a1-4366-b8fc-52c7e44673d1 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

pelikhan · 2026-04-05T08:55:44Z

.github/aw/create-agentic-workflow.md

+  github:
+    toolsets: [default]
+steps:
+  - name: Fetch data


@copilot use gh_token and set proper permissions to avoid rate limits

Done in 41f63b5. All step snippets now include env: GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} and the template already has permissions: actions: read. I also moved ${{ github.event.workflow_run.id }} into a RUN_ID env var (following the pattern in ci-doctor.md) so the shell command uses "$RUN_ID" instead of raw expression interpolation.

…ippets Agent-Logs-Url: https://github.com/github/gh-aw/sessions/e4abe7a1-8d34-48e8-b4ff-6bfe4ec72dac Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

pelikhan

@copilot optimize contant for agent consumption. Suggest using cache memory as well

…add cache-memory tip Agent-Logs-Url: https://github.com/github/gh-aw/sessions/b3d19cbb-dbdf-4c96-bce7-12da512c46d8 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2026-04-05T09:08:57Z

@copilot optimize contant for agent consumption. Suggest using cache memory as well

Done in c1a30ba. The section is now significantly shorter and structured for agent scanning:

Replaced verbose prose with a Rules: bullet list (4 items)
Merged the three example snippets into a single quick-reference table
The template now includes cache-memory: true alongside tools: and shows an example prompt instruction (Check /tmp/gh-aw/cache-memory/seen-runs.json …) demonstrating dedup across runs
Added a dedicated cache-memory tip at the end explaining deduplication, trending, and retry-avoidance, with a link to memory.md

github-actions · 2026-04-05T09:11:43Z

Hey @Copilot 👋 — thanks for adding the Pre-step Data Fetching pattern! The new subsection in create-agentic-workflow.md is genuinely useful — the rationale around token budgets and the reusable YAML template are exactly the kind of actionable guidance that users need.

One thing to address before this is ready for review:

Split the unrelated JSON reformatting — pkg/agentdrain/data/default_weights.json has ~7 hunks that compact multi-line JSON arrays into single-line format. This is a purely cosmetic change with no functional impact and isn't mentioned anywhere in the PR description. It should either be reverted (if it was an accidental side-effect of the editing session) or moved into its own dedicated PR with a clear rationale. Keeping it here makes reviewer diffs harder to read and makes git history noisy.

Also worth noting: the PR is still in draft state — once the scope is tightened, flip it to ready-for-review.

If you'd like a hand, you can assign this prompt to your coding agent:

In the PR branch for #24685 (docs: add pre-step data-fetching pattern), revert all changes to `pkg/agentdrain/data/default_weights.json` so that file is identical to the base branch (main). The goal is to keep this PR focused solely on the documentation addition to `.github/aw/create-agentic-workflow.md`.

Steps:
1. Run: git checkout origin/main -- pkg/agentdrain/data/default_weights.json
2. Verify the file is restored: git diff origin/main -- pkg/agentdrain/data/default_weights.json should produce no output.
3. Stage and commit the revert with message: "revert: restore default_weights.json to main (out-of-scope reformatting)"
4. Push the branch.

Generated by Contribution Check · ● 1.9M · ◷

pelikhan · 2026-04-05T09:11:57Z

@copilot suggest using jq for efficient filtering

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/9e2d3ec1-d10f-4c63-b6d5-a82bf271a007 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2026-04-05T09:16:25Z

@copilot suggest using jq for efficient filtering

Done in e012914. Three places now reference jq:

Rules — new bullet: "Use jq to filter JSON responses before writing to disk — extract only the fields the agent needs and keep file sizes small."
Use cases table — new row: gh api …/issues --jq '[.[] | {number,title,state,labels}]' > /tmp/gh-aw/agent/issues.json
cache-memory tip — shows a jq one-liner for updating the dedup file in-place (append + atomic move)

pelikhan · 2026-04-05T09:19:47Z

@copilot the tools.agentic-workflows enables logs and audit commands designed to analyze agentic workflow runs.

…table Agent-Logs-Url: https://github.com/github/gh-aw/sessions/ad8bf1ae-1a34-4c0a-b6a2-f6a24784bdb2 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2026-04-05T09:25:11Z

@copilot the tools.agentic-workflows enables logs and audit commands designed to analyze agentic workflow runs.

Done in 0b4ff6a. Added a row to the use-cases table: "Agentic workflow run logs — No shell step needed — add tools: agentic-workflows: and the agent uses logs and audit commands directly."

Copilot

Pull request overview

Adds documentation describing a “pre-step data fetching” pattern for agentic workflows so heavy/verbose data is downloaded and trimmed before the agent runs (reducing token usage and repeated API calls).

Changes:

Adds a new “Pre-step Data Fetching” best-practices subsection with rules, a template, use-case table, and cache-memory dedup tips.
Updates pkg/agentdrain/data/default_weights.json formatting/content (not described in the PR summary).

Show a summary per file

File	Description
`.github/aw/create-agentic-workflow.md`	Documents a pre-step `steps:` pattern (download/trim/store) with examples and `cache-memory` usage.
`pkg/agentdrain/data/default_weights.json`	Updates embedded default weights JSON (formatting/content changes).

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 2/2 changed files
Comments generated: 5

Copilot · 2026-04-05T09:29:16Z

pkg/agentdrain/data/default_weights.json

        "id": 5,
        "size": 1,
        "stage": "error",
-        "template": [
-          "stage=error",
-          "reason=The",
-          "Serena",
-          "MCP",
-          "server",
-          "is",
-          "not",
-          "available",
-          "in",
-          "this",
-          "environment.",
-          "No",
-          "serena-*",
-          "tools",
-          "are",
-          "registered.",
-          "tool=Serena",
-          "MCP",
-          "server",
-          "type=missing_tool"
-        ]
+        "template": ["stage=error", "reason=The", "Serena", "MCP", "server", "is", "not", "available", "in", "this", "environment.", "No", "serena-*", "tools", "are", "registered.", "tool=Serena", "MCP", "server", "type=missing_tool"]
      },


The PR description focuses on documentation changes, but this file is also modified (formatting/serialization changes). Please either mention this change in the PR description or split it into a separate PR so reviewers understand why default weights data is being updated.

Copilot · 2026-04-05T09:29:16Z

.github/aw/create-agentic-workflow.md

+    max: 1
+---
+
+Analyze `/tmp/gh-aw/agent/ci-logs-trimmed.txt`. Identify the root cause and post a comment to the triggering PR.


The template prompt says to “post a comment to the triggering PR”, but workflow_run can be triggered by non-PR runs (e.g., pushes) where there is no associated PR to comment on. Consider updating the instructions to handle the “no PR” case explicitly (e.g., noop, or comment on the run/commit instead).

Suggested change

Analyze `/tmp/gh-aw/agent/ci-logs-trimmed.txt`. Identify the root cause and post a comment to the triggering PR.

Analyze `/tmp/gh-aw/agent/ci-logs-trimmed.txt`. Identify the root cause and determine whether the triggering `workflow_run` is associated with a pull request. If it is, post a comment to that PR. If no PR is associated with the run, skip commenting.

Copilot · 2026-04-05T09:29:17Z

.github/aw/create-agentic-workflow.md

+
+| Scenario | Step snippet |
+|---|---|
+| Deployment logs (Heroku/Vercel/Railway) | `heroku logs --tail --num 200 --app ${{ vars.HEROKU_APP }} > /tmp/gh-aw/agent/deploy-logs.txt` |


The Heroku example uses heroku logs --tail ..., which will continue streaming and can cause the pre-step to hang until the job times out. For a bounded, deterministic pre-fetch, use a non-tailing invocation that exits after a fixed number of lines (or add an explicit timeout).

Suggested change

| Deployment logs (Heroku/Vercel/Railway) | `heroku logs --tail --num 200 --app ${{ vars.HEROKU_APP }} > /tmp/gh-aw/agent/deploy-logs.txt` |

| Deployment logs (Heroku/Vercel/Railway) | `heroku logs --num 200 --app ${{ vars.HEROKU_APP }} > /tmp/gh-aw/agent/deploy-logs.txt` |

Copilot · 2026-04-05T09:29:17Z

.github/aw/create-agentic-workflow.md

+| Scenario | Step snippet |
+|---|---|
+| Deployment logs (Heroku/Vercel/Railway) | `heroku logs --tail --num 200 --app ${{ vars.HEROKU_APP }} > /tmp/gh-aw/agent/deploy-logs.txt` |
+| Build / test output | `npm ci 2>&1 \| tail -200 > /tmp/gh-aw/agent/build.txt && npm run test -- --reporter=json > /tmp/gh-aw/agent/test.json 2>&1 \|\| true` |


The build/test snippet pipes npm ci into tail, which means the pipeline exit status will come from tail (so npm ci failures may be ignored unless pipefail is set). Consider adjusting the example so failures are handled intentionally (e.g., enable set -o pipefail, or avoid masking the command’s exit code while still trimming output).

Suggested change

| Build / test output | `npm ci 2>&1 \| tail -200 > /tmp/gh-aw/agent/build.txt && npm run test -- --reporter=json > /tmp/gh-aw/agent/test.json 2>&1 \|\| true` |

| Build / test output | `set -o pipefail && npm ci 2>&1 \| tail -200 > /tmp/gh-aw/agent/build.txt && npm run test -- --reporter=json > /tmp/gh-aw/agent/test.json 2>&1 \|\| true` |

Copilot · 2026-04-05T09:29:17Z

.github/aw/create-agentic-workflow.md

+| Filter JSON API response | `gh api repos/{owner}/{repo}/issues --jq '[.[] \| {number,title,state,labels:[.labels[].name]}]' > /tmp/gh-aw/agent/issues.json` |
+| Agentic workflow run logs | No shell step needed — add `tools: agentic-workflows:` and the agent uses `logs` and `audit` commands directly |
+
+**`cache-memory` tip:** Add `cache-memory: true` under `tools:` to persist pre-fetched data across runs. This enables deduplication (skip already-diagnosed run IDs), trending (compare metrics over time), and avoids redundant downloads on retries. The agent reads and writes `/tmp/gh-aw/cache-memory/`. Use `jq` to update the dedup file efficiently — for example `jq '. + ["'"$RUN_ID"'"]' /tmp/gh-aw/cache-memory/seen-runs.json > /tmp/seen-runs.tmp && mv /tmp/seen-runs.tmp /tmp/gh-aw/cache-memory/seen-runs.json`. See `.github/aw/memory.md` for full configuration options.


The jq one-liner assumes /tmp/gh-aw/cache-memory/seen-runs.json already exists and contains valid JSON; on a first run it likely won’t, so the example will fail. Consider showing an initialization/fallback ([] if missing) and, if you want to guarantee deduplication, ensure the updated array is unique.

Suggested change

**`cache-memory` tip:** Add `cache-memory: true` under `tools:` to persist pre-fetched data across runs. This enables deduplication (skip already-diagnosed run IDs), trending (compare metrics over time), and avoids redundant downloads on retries. The agent reads and writes `/tmp/gh-aw/cache-memory/`. Use `jq` to update the dedup file efficiently — for example `jq '. + ["'"$RUN_ID"'"]' /tmp/gh-aw/cache-memory/seen-runs.json > /tmp/seen-runs.tmp && mv /tmp/seen-runs.tmp /tmp/gh-aw/cache-memory/seen-runs.json`. See `.github/aw/memory.md` for full configuration options.

**`cache-memory` tip:** Add `cache-memory: true` under `tools:` to persist pre-fetched data across runs. This enables deduplication (skip already-diagnosed run IDs), trending (compare metrics over time), and avoids redundant downloads on retries. The agent reads and writes `/tmp/gh-aw/cache-memory/`. Use `jq` to update the dedup file efficiently — for example `jq -n --arg run_id "$RUN_ID" '((input // []) + [$run_id]) | unique' < <(cat /tmp/gh-aw/cache-memory/seen-runs.json 2>/dev/null || printf '[]') > /tmp/seen-runs.tmp && mv /tmp/seen-runs.tmp /tmp/gh-aw/cache-memory/seen-runs.json`. See `.github/aw/memory.md` for full configuration options.

Initial plan

5c01816

Copilot AI assigned Copilot and pelikhan Apr 5, 2026

Copilot started work on behalf of pelikhan April 5, 2026 08:44 View session

Copilot AI linked an issue Apr 5, 2026 that may be closed by this pull request

[plan] Document pre-step data-fetching pattern in create-agentic-workflow.md #24682

Closed

6 tasks

docs: document pre-step data-fetching pattern in create-agentic-workf…

3cccd00

…low.md Agent-Logs-Url: https://github.com/github/gh-aw/sessions/5f9dc564-23a1-4366-b8fc-52c7e44673d1 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Document pre-step data-fetching pattern in create-agentic-workflow.md~~ docs: add pre-step data-fetching pattern to create-agentic-workflow.md Apr 5, 2026

Copilot AI requested a review from pelikhan April 5, 2026 08:53

Copilot finished work on behalf of pelikhan April 5, 2026 08:54

pelikhan reviewed Apr 5, 2026

View reviewed changes

Copilot started work on behalf of pelikhan April 5, 2026 08:55 View session

docs: add GH_TOKEN env and proper permissions to pre-step template sn…

41f63b5

…ippets Agent-Logs-Url: https://github.com/github/gh-aw/sessions/e4abe7a1-8d34-48e8-b4ff-6bfe4ec72dac Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI requested a review from pelikhan April 5, 2026 09:02

Copilot finished work on behalf of pelikhan April 5, 2026 09:02

pelikhan reviewed Apr 5, 2026

View reviewed changes

Copilot started work on behalf of pelikhan April 5, 2026 09:03 View session

docs: optimize pre-step data-fetching section for agent consumption, …

c1a30ba

…add cache-memory tip Agent-Logs-Url: https://github.com/github/gh-aw/sessions/b3d19cbb-dbdf-4c96-bce7-12da512c46d8 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI requested a review from pelikhan April 5, 2026 09:09

Copilot finished work on behalf of pelikhan April 5, 2026 09:09

github-actions bot mentioned this pull request Apr 5, 2026

[Contribution Check Report] Contribution Check — 2026-04-05 #24636

Closed

Copilot started work on behalf of pelikhan April 5, 2026 09:12 View session

docs: suggest jq for efficient JSON filtering in pre-step data fetching

e012914

Agent-Logs-Url: https://github.com/github/gh-aw/sessions/9e2d3ec1-d10f-4c63-b6d5-a82bf271a007 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot finished work on behalf of pelikhan April 5, 2026 09:17

Copilot started work on behalf of pelikhan April 5, 2026 09:19 View session

docs: mention tools.agentic-workflows for logs and audit in pre-step …

0b4ff6a

…table Agent-Logs-Url: https://github.com/github/gh-aw/sessions/ad8bf1ae-1a34-4c0a-b6a2-f6a24784bdb2 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot finished work on behalf of pelikhan April 5, 2026 09:26

pelikhan marked this pull request as ready for review April 5, 2026 09:26

Copilot AI review requested due to automatic review settings April 5, 2026 09:26

pelikhan merged commit f57eb67 into main Apr 5, 2026
1 check passed

pelikhan deleted the copilot/document-pre-step-data-fetching branch April 5, 2026 09:26

Copilot started reviewing on behalf of pelikhan April 5, 2026 09:27 View session

github-actions bot mentioned this pull request Apr 5, 2026

Smoke Test: Copilot - 23998581852 #24699

Closed

Copilot AI reviewed Apr 5, 2026

View reviewed changes

github-actions bot mentioned this pull request Apr 5, 2026

Smoke Test: Claude - 23998581834 #24700

Closed

	Analyze `/tmp/gh-aw/agent/ci-logs-trimmed.txt`. Identify the root cause and post a comment to the triggering PR.
	Analyze `/tmp/gh-aw/agent/ci-logs-trimmed.txt`. Identify the root cause and determine whether the triggering `workflow_run` is associated with a pull request. If it is, post a comment to that PR. If no PR is associated with the run, skip commenting.

	\| Deployment logs (Heroku/Vercel/Railway) \| `heroku logs --tail --num 200 --app ${{ vars.HEROKU_APP }} > /tmp/gh-aw/agent/deploy-logs.txt` \|
	\| Deployment logs (Heroku/Vercel/Railway) \| `heroku logs --num 200 --app ${{ vars.HEROKU_APP }} > /tmp/gh-aw/agent/deploy-logs.txt` \|

	\| Build / test output \| `npm ci 2>&1 \\| tail -200 > /tmp/gh-aw/agent/build.txt && npm run test -- --reporter=json > /tmp/gh-aw/agent/test.json 2>&1 \\|\\| true` \|
	\| Build / test output \| `set -o pipefail && npm ci 2>&1 \\| tail -200 > /tmp/gh-aw/agent/build.txt && npm run test -- --reporter=json > /tmp/gh-aw/agent/test.json 2>&1 \\|\\| true` \|

Conversation

Copilot AI commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

pelikhan Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

pelikhan left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI commented Apr 5, 2026

Uh oh!

github-actions bot commented Apr 5, 2026

Uh oh!

pelikhan commented Apr 5, 2026

Uh oh!

Copilot AI commented Apr 5, 2026

Uh oh!

pelikhan commented Apr 5, 2026

Uh oh!

Copilot AI commented Apr 5, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Apr 5, 2026 •

edited

Loading