Treat a completed safe-output (PR created) as success instead of hard-failing on the Copilot session.idle idle-timeout — this run did all its work and still reported failure.
Problem statement
The Copilot CLI engine hung after the agent finished. The agent emitted assistant.message: "PR submitted successfully" (07:08:21Z) and a valid create_pull_request safe-output, but the SDK driver never reached session.idle. At 07:11:08Z it hit [copilot-sdk-driver] error: Timeout after 570000ms waiting for session.idle; the action's 10-minute wrapper (GH_AW_TIMEOUT_MINUTES=10) then killed the step: ##[error]The action 'Execute GitHub Copilot CLI' has timed out after 10 minutes. No tool errored — the engine simply idle-hung post-completion.
Affected workflows and run IDs
- Dictation Prompt Generator (
.github/workflows/dictation-prompt.lock.yml)
- Failed run: §27491307552 (2026-06-14T06:59Z) — first failure after 5 consecutive successes (regression).
- Last success comparator: §27085344010 (2026-06-07) — had 0
session.idle timeouts.
Probable root cause
False-failure / harness idle-timeout: the Copilot SDK driver did not signal session.idle after the agent completed, so the session.idle watchdog (570s) and then the 10-min action timeout fired despite the work being done. The downstream safe_outputs job did successfully create PR #39195 (labels applied, .patch/.bundle artifact produced) — confirming the agent run itself was functionally complete. audit-diff vs the last success shows the failed run ran 14m55s / 1.06M input tokens / 25 requests, i.e. real completed work, not a stall mid-task.
Proposed remediation
- Detect that a terminal safe-output (e.g.
create_pull_request) was emitted and short-circuit the run to success rather than waiting on session.idle, OR
- Send an explicit session-end / shutdown signal to the Copilot SDK after the final safe-output so
session.idle resolves, OR
- As a stopgap, raise the
session.idle timeout headroom so post-completion idle does not collide with the 10-min action timeout.
Success criteria / verification
- A Dictation Prompt Generator run that successfully produces its PR reports job success, not failure.
- No
Timeout ... waiting for session.idle immediately following a successful final safe-output.
Low frequency (weekly workflow, 1 occurrence) but a confirmed false-failure that generates investigation noise. Parent: #29109. Analyzed runs: 27491307552, 27085344010.
Related to #29109
Generated by 🔍 [aw] Failure Investigator (6h) · 343.9 AIC · ⌖ 12.7 AIC · ⊞ 4.5K · ◷
Treat a completed safe-output (PR created) as success instead of hard-failing on the Copilot
session.idleidle-timeout — this run did all its work and still reported failure.Problem statement
The Copilot CLI engine hung after the agent finished. The agent emitted
assistant.message: "PR submitted successfully"(07:08:21Z) and a validcreate_pull_requestsafe-output, but the SDK driver never reachedsession.idle. At 07:11:08Z it hit[copilot-sdk-driver] error: Timeout after 570000ms waiting for session.idle; the action's 10-minute wrapper (GH_AW_TIMEOUT_MINUTES=10) then killed the step:##[error]The action 'Execute GitHub Copilot CLI' has timed out after 10 minutes. No tool errored — the engine simply idle-hung post-completion.Affected workflows and run IDs
.github/workflows/dictation-prompt.lock.yml)session.idletimeouts.Probable root cause
False-failure / harness idle-timeout: the Copilot SDK driver did not signal
session.idleafter the agent completed, so thesession.idlewatchdog (570s) and then the 10-min action timeout fired despite the work being done. The downstreamsafe_outputsjob did successfully create PR #39195 (labels applied,.patch/.bundleartifact produced) — confirming the agent run itself was functionally complete. audit-diff vs the last success shows the failed run ran 14m55s / 1.06M input tokens / 25 requests, i.e. real completed work, not a stall mid-task.Proposed remediation
create_pull_request) was emitted and short-circuit the run to success rather than waiting onsession.idle, ORsession.idleresolves, ORsession.idletimeout headroom so post-completion idle does not collide with the 10-min action timeout.Success criteria / verification
Timeout ... waiting for session.idleimmediately following a successful final safe-output.Low frequency (weekly workflow, 1 occurrence) but a confirmed false-failure that generates investigation noise. Parent: #29109. Analyzed runs: 27491307552, 27085344010.
Related to #29109