Skip to content

Return type=error on CU agent failures instead of type=done#41

Closed
abrichr wants to merge 1 commit intofix/scoring-and-infra-reliabilityfrom
fix/cu-agent-reliability
Closed

Return type=error on CU agent failures instead of type=done#41
abrichr wants to merge 1 commit intofix/scoring-and-infra-reliabilityfrom
fix/cu-agent-reliability

Conversation

@abrichr
Copy link
Member

@abrichr abrichr commented Feb 25, 2026

Summary

  • Early exit with type="error" when no screenshot available from environment (infrastructure failure)
  • Return type="error" on API call failure instead of silently returning type="done"
  • Return type="error" on retry exhaustion in screenshot/wait loop
  • Runner now handles type="error" as terminal action alongside type="done"
  • Runner propagates error_type from agent error action to BenchmarkResult

Test plan

  • All 316 existing tests pass
  • Updated CU agent tests to verify new error action behavior
  • Verify CU agent returns error with error_type=infrastructure when WAA is down

🤖 Generated with Claude Code

- Early exit with type="error" when no screenshot available (infrastructure)
- Return type="error" on API call failure instead of type="done"
- Return type="error" on retry exhaustion (screenshot/wait loop)
- Runner handles type="error" as terminal action alongside type="done"
- Runner propagates error_type from agent error to BenchmarkResult
- Update tests to verify new error action behavior

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@abrichr
Copy link
Member Author

abrichr commented Feb 25, 2026

Superseded by #42 (combined PR targeting main)

@abrichr abrichr closed this Feb 25, 2026
@abrichr abrichr deleted the fix/cu-agent-reliability branch February 28, 2026 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant