Stop Codex harness retries on active-goal router failures#39156
Stop Codex harness retries on active-goal router failures#39156Copilot wants to merge 7 commits into
Conversation
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
|
✅ Test Quality Sentinel completed test quality analysis. |
|
✅ Design Decision Gate 🏗️ completed the design decision gate check. No ADR enforcement needed: PR #39156 does not have the 'implementation' label and has 0 new lines of code in business logic directories (≤100 threshold). Neither enforcement condition is met. |
|
✅ PR Code Quality Reviewer completed the code quality review. |
|
🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅ |
There was a problem hiding this comment.
Pull request overview
This PR updates the Codex harness retry behavior to stop retrying when Codex reports a known non-retryable “active goal already exists for this thread” router failure, reducing wasted fresh-run retries and token consumption.
Changes:
- Added a new non-retryable guard (
goalAlreadyActive) to the shared harness retry-guard detector. - Updated Codex harness retry loop to treat the new guard as terminal (no further retries) and include it in the non-retryable logging path.
- Extended unit tests to validate detection of the new guard and Codex retry-policy behavior.
Show a summary per file
| File | Description |
|---|---|
| actions/setup/js/harness_retry_guard.cjs | Adds the goalAlreadyActive guard detection pattern and exposes it via the guard result. |
| actions/setup/js/harness_retry_guard.test.cjs | Adds test coverage for the new active-goal guard signal. |
| actions/setup/js/codex_harness.cjs | Stops retries when goalAlreadyActive is detected and logs it as non-retryable. |
| actions/setup/js/codex_harness.test.cjs | Adds retry-policy coverage ensuring no retry occurs on the active-goal router failure. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 4/4 changed files
- Comments generated: 2
| const AI_CREDITS_EXCEEDED_PATTERNS = [/\bmax[\s_-]*ai[\s_-]*credits[\s_-]*exceeded\b/i, /\bai[\s_-]*credits[\s_-]*rate[\s_-]*limit[\s_-]*error\b/i, /ai[\s_-]*credits?.*(?:rate[\s-]*limit|limit exceeded|budget exceeded|exceeded)/i]; | ||
|
|
||
| const AWF_API_PROXY_BLOCKING_REQUESTS_PATTERNS = [/\bawf\b.*\bapi[\s_-]*proxy\b.*\bblocking requests\b/i, /\bapi[\s_-]*proxy\b.*\bblocking requests\b/i, /\bapi[\s_-]*proxy\b.*\bblocked requests?\b/i, /\bDIFC_FILTERED\b/]; | ||
| const GOAL_ALREADY_ACTIVE_PATTERNS = [/\bthis thread already has a goal\b.*\buse update_goal only when the existing goal is complete\b/i]; |
There was a problem hiding this comment.
Fixed. Replaced .* with [\s\S]*? so the pattern matches across newlines, and simplified the second anchor to \buse update_goal\b for flexibility. A dedicated multiline test confirms the behavior.
| const reasons = []; | ||
| if (nonRetryableGuard.aiCreditsExceeded) reasons.push("AI credits budget exceeded"); | ||
| if (nonRetryableGuard.awfAPIProxyBlockingRequests) reasons.push("AWF API proxy is blocking requests"); | ||
| if (nonRetryableGuard.goalAlreadyActive) reasons.push("goal is already active for this thread"); |
There was a problem hiding this comment.
Fixed. The reason string now reads: "goal is already active for this thread (use update_goal when the current goal is complete)".
🧪 Test Quality Sentinel Report✅ Test Quality Score: 80/100 — Excellent
📊 Metrics & Test Classification (3 tests analyzed)
Test Classification Details
Language SupportTests analyzed:
Verdict
📖 Understanding Test ClassificationsDesign Tests (High Value) verify what the system does:
Implementation Tests (Low Value) verify how the system does it:
Goal: Shift toward tests that describe the system's behavioral contract — the promises it makes to its users and collaborators.
|
There was a problem hiding this comment.
✅ Test Quality Sentinel: 80/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%). All 3 analyzed tests (2 new, 1 modified) verify behavioral contracts with full error/edge-case coverage and no coding-guideline violations.
There was a problem hiding this comment.
Skills-Based Review 🧠
Applied /diagnose and /tdd — requesting changes on pattern robustness and test fidelity.
The production fix is correct and the structure follows established patterns well. Two issues worth addressing before merge:
- Pattern fragility (
harness_retry_guard.cjs:8) — the single regex requires both phrases to appear on the same line (.*does not span newlines). If Codex output wraps the error across lines, or rephrases slightly, the guard silently stops working and the token-burn regression resurfaces. - Mock drift (
codex_harness.test.cjs:416) — the localshouldRetrywas updated to check onlygoalAlreadyActive, leavingaiCreditsExceeded/awfAPIProxyBlockingRequestsabsent from the mock; mirroring the full guard object would keep the mock structurally aligned with production.
📋 Positive highlights
- ✅ Clean additive change — no structural churn, consistent with existing guard handling
- ✅ Two-layer coverage: guard unit tests (
harness_retry_guard.test.cjs) + policy integration tests (codex_harness.test.cjs) - ✅
reasons.push(...)/ log message follows the existing pattern exactly - ✅ JSDoc return type updated to reflect the new field
- ✅
GOAL_ALREADY_ACTIVE_PATTERNSexported for testability, consistent with peers
🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 225.6 AIC · ⌖ 26.1 AIC · ⊞ 27.9K
| const AI_CREDITS_EXCEEDED_PATTERNS = [/\bmax[\s_-]*ai[\s_-]*credits[\s_-]*exceeded\b/i, /\bai[\s_-]*credits[\s_-]*rate[\s_-]*limit[\s_-]*error\b/i, /ai[\s_-]*credits?.*(?:rate[\s-]*limit|limit exceeded|budget exceeded|exceeded)/i]; | ||
|
|
||
| const AWF_API_PROXY_BLOCKING_REQUESTS_PATTERNS = [/\bawf\b.*\bapi[\s_-]*proxy\b.*\bblocking requests\b/i, /\bapi[\s_-]*proxy\b.*\bblocking requests\b/i, /\bapi[\s_-]*proxy\b.*\bblocked requests?\b/i, /\bDIFC_FILTERED\b/]; | ||
| const GOAL_ALREADY_ACTIVE_PATTERNS = [/\bthis thread already has a goal\b.*\buse update_goal only when the existing goal is complete\b/i]; |
There was a problem hiding this comment.
[/diagnose] GOAL_ALREADY_ACTIVE_PATTERNS is more brittle than its sibling arrays. The two-phrase .* requirement means the guard silently fails if: (a) the error spans multiple lines (.* does not match \n), or (b) Codex rephrases either clause. Silent failure here restores the exact token-burn regression this PR is fixing.
💡 Suggestions
Option A — anchor on the stable first phrase only:
const GOAL_ALREADY_ACTIVE_PATTERNS = [
/\bthis thread already has a goal\b/i,
];The first phrase is already specific enough to avoid false positives and needs no secondary anchor.
Option B — make .* newline-safe (dot-all) and add a shorter fallback:
const GOAL_ALREADY_ACTIVE_PATTERNS = [
/\bthis thread already has a goal\b[\s\S]*\buse update_goal\b/i, // full message, any whitespace
/\bthis thread already has a goal\b/i, // fallback for partial output
];For comparison, AI_CREDITS_EXCEEDED_PATTERNS uses 3 patterns to handle variations; GOAL_ALREADY_ACTIVE_PATTERNS is the only array with a single two-phrase requirement.
There was a problem hiding this comment.
Fixed. Used Option B ([\s\S]*?) — the pattern now matches across newlines and uses \buse update_goal\b as the second anchor, bringing it in line with the multi-alternative approach of sibling patterns.
| if (attempt === 0 && isAuthenticationFailedError(result.output)) return false; | ||
| if (isMissingApiKeyError(result.output)) return false; | ||
| if (hasNumerousPermissionDeniedIssues(result.output)) return false; | ||
| if (detectNonRetryableHarnessGuard(result.output).goalAlreadyActive) return false; |
There was a problem hiding this comment.
[/tdd] The local shouldRetry mock now checks goalAlreadyActive alone, while the production code checks all three guard flags together via a single detectNonRetryableHarnessGuard call. This creates drift: aiCreditsExceeded and awfAPIProxyBlockingRequests are absent from this mock, so a future reader testing "does aiCreditsExceeded prevent a retry?" via this describe block would find no coverage.
💡 Suggestion — use the full guard object for consistency
// Replace the single goalAlreadyActive check with the full guard:
const nonRetryableGuard = detectNonRetryableHarnessGuard(result.output);
if (nonRetryableGuard.aiCreditsExceeded || nonRetryableGuard.awfAPIProxyBlockingRequests || nonRetryableGuard.goalAlreadyActive) return false;This mirrors the exact production branch in codex_harness.cjs:442 and ensures the mock keeps pace with future guard extensions automatically.
There was a problem hiding this comment.
Fixed. Updated shouldRetry to mirror the production branch exactly: const nonRetryableGuard = detectNonRetryableHarnessGuard(result.output); if (nonRetryableGuard.aiCreditsExceeded || nonRetryableGuard.awfAPIProxyBlockingRequests || nonRetryableGuard.goalAlreadyActive) return false;
| expect(result.goalAlreadyActive).toBe(false); | ||
| }); | ||
|
|
||
| it("detects goal already active markers", () => { |
There was a problem hiding this comment.
[/tdd] The new test only validates the exact canonical message. Consider adding a negative case that confirms a partial first-phrase match alone does NOT trigger detection — this would make the two-phrase requirement an explicit, documented contract rather than an implicit one.
💡 Suggested additions
it("does not detect goal active from first phrase alone", () => {
const result = detectNonRetryableHarnessGuard("this thread already has a goal");
expect(result.goalAlreadyActive).toBe(false); // adjust to true if single-phrase is desired
});
it("detects goal already active when embedded in longer output", () => {
const result = detectNonRetryableHarnessGuard(
"[codex] cannot create a new goal because this thread already has a goal; use update_goal only when the existing goal is complete\nExit code: 1"
);
expect(result.goalAlreadyActive).toBe(true);
});Also: the existing null input test (line 52) and "both flags" test (line 58) were not updated to assert goalAlreadyActive: false — worth adding for completeness.
There was a problem hiding this comment.
Addressed. Added: (1) negative case asserting the first phrase alone does not trigger detection, (2) embedded-output multiline test, and (3) goalAlreadyActive: false assertions in the null input and both flags tests.
There was a problem hiding this comment.
Non-blocking findings — correct direction, two gaps worth addressing
The guard and detection logic are correct and the approach is sound. Two medium issues to consider before this pattern spreads further.
Findings
1. GOAL_ALREADY_ACTIVE_PATTERNS — single compound regex, no fallback (harness_retry_guard.cjs:8)
Both sibling pattern groups have 3–4 alternatives for resilience. This one has a single compound A.*B pattern that fails silently if codex slightly rewords the message or emits it across two lines. The failure mode is invisible: the guard misses, retries proceed against a permanently broken state, and compute credits are wasted until MAX_RETRIES is exhausted. Adding a simpler fallback (/\bthis thread already has a goal\b/i) or the s (dotAll) flag would bring it in line with the other guards.
2. Test validates a local shouldRetry clone, not the production path (codex_harness.test.cjs:416)
The shouldRetry function in this test block is an inline reimplementation, not the actual production guard in codex_harness.cjs. The new test at line 467 passes even if the production line 442 guard were deleted. An integration test running a stub that emits the active-goal message and verifying the harness makes exactly one attempt would close this gap — the noop pre-flight tests below are a ready-made template.
🔎 Code quality review by PR Code Quality Reviewer · 270.2 AIC · ⌖ 13.3 AIC · ⊞ 16.7K
| const AI_CREDITS_EXCEEDED_PATTERNS = [/\bmax[\s_-]*ai[\s_-]*credits[\s_-]*exceeded\b/i, /\bai[\s_-]*credits[\s_-]*rate[\s_-]*limit[\s_-]*error\b/i, /ai[\s_-]*credits?.*(?:rate[\s-]*limit|limit exceeded|budget exceeded|exceeded)/i]; | ||
|
|
||
| const AWF_API_PROXY_BLOCKING_REQUESTS_PATTERNS = [/\bawf\b.*\bapi[\s_-]*proxy\b.*\bblocking requests\b/i, /\bapi[\s_-]*proxy\b.*\bblocking requests\b/i, /\bapi[\s_-]*proxy\b.*\bblocked requests?\b/i, /\bDIFC_FILTERED\b/]; | ||
| const GOAL_ALREADY_ACTIVE_PATTERNS = [/\bthis thread already has a goal\b.*\buse update_goal only when the existing goal is complete\b/i]; |
There was a problem hiding this comment.
Regex is a single compound match with no fallback — any message change silently disables the guard.
Unlike AI_CREDITS_EXCEEDED_PATTERNS (3 alternatives) and AWF_API_PROXY_BLOCKING_REQUESTS_PATTERNS (4 alternatives), GOAL_ALREADY_ACTIVE_PATTERNS has one pattern that requires both phrases to appear in order on the same line. The .* quantifier does not cross newlines, so a multi-line output or a slight rewording of the codex error defeats it silently. The failure mode is invisible: the guard misses, retries continue against a permanently-broken state, and credits are wasted.
💡 Suggested fix
Add at least one shorter fallback so minor message variations are still caught:
const GOAL_ALREADY_ACTIVE_PATTERNS = [
// prefer the full compound message for precision
/\bthis thread already has a goal\b.*\buse update_goal only when the existing goal is complete\b/i,
// fallback: catches rewording or multi-line output
/\bthis thread already has a goal\b/i,
];Alternatively, add the s (dotAll) flag to the compound regex so it can match across newlines: /.../is.
There was a problem hiding this comment.
Fixed. Replaced the single compound .* regex with [\s\S]*? so it matches across newlines. The two-phrase requirement is preserved for precision, with the negative test documenting that the first phrase alone is insufficient.
| if (attempt === 0 && isAuthenticationFailedError(result.output)) return false; | ||
| if (isMissingApiKeyError(result.output)) return false; | ||
| if (hasNumerousPermissionDeniedIssues(result.output)) return false; | ||
| if (detectNonRetryableHarnessGuard(result.output).goalAlreadyActive) return false; |
There was a problem hiding this comment.
Test exercises a local reimplementation of shouldRetry, not the production retry path — regression in codex_harness.cjs would not be caught here.
The shouldRetry at line 409 is an inline closure used only in this test block. The new check at line 416 correctly adds goalAlreadyActive, but the test verifies the local copy of the logic, not the actual detectNonRetryableHarnessGuard+break path inside main() in codex_harness.cjs. If the production guard on line 442 of that file were accidentally removed, all tests in this describe block would still pass.
💡 Why this matters
The local shouldRetry also omits the aiCreditsExceeded and awfAPIProxyBlockingRequests checks that are in production, confirming it has always been a simplified proxy. The harness_retry_guard.test.cjs tests do cover detection in isolation, but no test directly exercises the codex_harness.cjs retry guard end-to-end for the goalAlreadyActive signal.
Consider adding a lightweight integration test (similar to the noop pre-flight tests below this block) that runs the actual harness stub with an output containing the active-goal error and asserts it does not retry.
There was a problem hiding this comment.
Fixed. Updated shouldRetry to mirror the full production guard (aiCreditsExceeded || awfAPIProxyBlockingRequests || goalAlreadyActive), so removing any of those guards from codex_harness.cjs would break the test.
|
@copilot run pr-finisher skill |
…hint Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
All review threads addressed and pushed. See summary below. |
|
@copilot review all comments and address the unresolved review feedback. Please push the fixes and request another review once they’re in.
|
|
``
|
|
``
|
|
``
|
Codex harness retries were treating a known non-retryable router failure as partial-execution noise, causing repeated fresh runs and avoidable token burn. This change classifies the “thread already has a goal / use
update_goal” failure as terminal for the current attempt sequence.Retry guard expansion
harness_retry_guard.cjsfor the active-goal router error emitted by Codex.Harness retry policy update
codex_harness.cjsto short-circuit retries when the newgoalAlreadyActiveguard is detected.Focused test coverage
harness_retry_guard.test.cjsto assert active-goal detection.codex_harness.test.cjsretry-policy tests to verify no retry on this error and reused shared guard detection logic.pr-sous-chef: requested branch update in run 27485386371
pr-sous-chef: requested branch update for run 27504382710