Migrate AI local review to OpenAI Responses API by igerber · Pull Request #234 · igerber/diff-diff

igerber · 2026-03-23T21:51:06Z

Summary

Migrate /ai-review-local script from Chat Completions API to Responses API, enabling gpt-5.4-pro and future reasoning models
Add _is_reasoning_model() helper, REASONING_MAX_TOKENS, gpt-5.4-pro pricing, --timeout CLI flag
Extract _extract_response_text() helper with content-first success detection (valid content accepted regardless of API status field)
Add reasoning token reporting in completion summary and timeout guidance for reasoning models
Update skill definition with --timeout docs, background execution for reasoning models, and deep review examples

Methodology references (required if estimator / math changes)

N/A — no methodology changes. All changes are to review tooling (.claude/scripts/, .claude/commands/) and tests.

Validation

Tests added/updated: tests/test_openai_review.py — 36 new tests across 6 classes:
- TestIsReasoningModel (10 tests)
- TestProModelPricing (2 tests)
- TestExtractResponseText (5 tests)
- TestResponsesAPIConstants (2 tests)
- TestCallOpenAIPayload (11 tests — payload construction, response parsing, error paths, variant shapes)
Live smoke tests: verified with both gpt-5.4 and gpt-5.4-pro models
All 151 tests passing

Security / privacy

Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

Enable gpt-5.4-pro and future reasoning models by switching the OpenAI endpoint from /v1/chat/completions to /v1/responses. Add _is_reasoning_model() helper, REASONING_MAX_TOKENS, --timeout CLI flag, gpt-5.4-pro pricing, conditional temperature/max-tokens, reasoning token reporting, and mocked unit tests for payload construction. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

The output_text convenience field is null in raw HTTP responses (only populated by the Python SDK). Extract text from output[].content[] items instead. Update mock test responses to match real API format. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

…ests P1: Relax status check — extract content first via _extract_response_text(), only fail when no usable content is found (not on status mismatch alone). P2: Extract response parsing into dedicated helper with unit tests covering multiple payload shapes (missing status, null status, SDK output_text, multiple output blocks). Add reasoning-model timeout hint to stderr. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Pin the content-first parsing behavior: status='incomplete' with usable output should return content, not exit. Addresses P2 from gpt-5.4-pro review. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

github-actions · 2026-03-23T21:58:12Z

Overall Assessment

⚠️ Needs changes

Executive Summary

No methodology-registered estimator logic is touched here; the diff is confined to local review tooling and tests.
P1: the new Responses API path treats status="incomplete" output as a successful review whenever any text is present, so truncated reviews can be saved and reported as complete.
P2: the new gpt-5.4-pro pricing constant materially underquotes the tool’s cost estimate.
P3: the script help text still says “Chat Completions API,” so the migration left one user-facing doc path stale.
Verification was limited: python3 .claude/scripts/openai_review.py --help ran successfully, but pytest is not installed in this environment.

Methodology

No findings. Affected methods: none. I checked docs/methodology/REGISTRY.md; this PR only changes .claude/ review tooling and tests/test_openai_review.py, not any estimator, weighting, variance/SE, identification, or default estimator behavior.

Code Quality

Severity P1. Impact: call_openai() now accepts incomplete Responses API results as success if any text is present, and the new test suite explicitly locks that behavior in. On token-limit truncation, the tool will write a partial review to disk and print “AI Review complete,” which can silently suppress findings on large reviews. Concrete fix: require result["status"] == "completed" before treating the response as a successful review; if status is incomplete, surface incomplete_details, optionally persist the text as clearly labeled partial output, and exit non-zero or emit a prominent incomplete warning. Update the new success test accordingly. Location: .claude/scripts/openai_review.py:L1210-L1228, tests/test_openai_review.py:L1713-L1725. (platform.openai.com)
Severity P2. Impact: the added gpt-5.4-pro pricing entry underestimates user-facing cost estimates by a large margin. The code now hard-codes $15/$60, but OpenAI’s current official pricing for gpt-5.4-pro is $30/M input and $180/M output. Concrete fix: update PRICING["gpt-5.4-pro"] and the paired tests to the official rates. Location: .claude/scripts/openai_review.py:L852-L862, tests/test_openai_review.py:L1559-L1572. (openai.com)

Performance

No findings.

Maintainability

No findings.

Tech Debt

No separate tech-debt finding. The issues above are not already tracked in TODO.md.

Security

No findings.

Documentation/Tests

Severity P3. Impact: the migrated script still advertises “OpenAI Chat Completions API” in CLI help, so --help now contradicts both the endpoint constant and the updated command markdown. Concrete fix: update the argparse description string and add a small help-output assertion so future API migrations cannot drift silently. Location: .claude/scripts/openai_review.py:L1245-L1248.

Path to Approval

Change call_openai() so non-completed Responses API statuses, especially incomplete with incomplete_details, are not treated as a normal successful review, and update tests/test_openai_review.py to assert that behavior.

P1: Treat status='incomplete' as a hard error even when content exists, since truncated reviews may silently suppress findings. Print incomplete_details and suggest remediation (reduce diff, lower context). P2: Fix gpt-5.4-pro pricing from (15,60) to (30,180) per official rates. P3: Fix argparse description to say "Responses API" not "Chat Completions". Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

igerber · 2026-03-23T22:03:53Z

/ai-review

github-actions · 2026-03-23T22:09:21Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 5af76f194818b0308614061ccd1f112a9989b0bf

Overall Assessment

✅ Looks good

Executive Summary

Re-review outcome: the prior P1 is resolved. call_openai() now treats status="incomplete" as a hard failure even when text is present, and the new regression tests assert that path. .claude/scripts/openai_review.py:1212, tests/test_openai_review.py:1713. The Responses API response object documents status and incomplete_details for incomplete runs. citeturn1search7
The prior pricing concern is resolved. PRICING now contains a dedicated gpt-5.4-pro tier and estimate_cost() uses exact match plus longest-prefix fallback, so snapshot aliases no longer fall through to base-model pricing. .claude/scripts/openai_review.py:855, .claude/scripts/openai_review.py:865, tests/test_openai_review.py:1559. OpenAI’s current pricing page lists gpt-5.4-pro at $30 input / $180 output per 1M tokens. citeturn2search0
No methodology-relevant library code is touched. I checked docs/methodology/REGISTRY.md; the diff is confined to local review tooling and its tests.
The stale Chat Completions wording is fixed in both the CLI help and the slash-command doc, and --timeout is now plumbed through end to end. .claude/scripts/openai_review.py:1264, .claude/scripts/openai_review.py:1344, .claude/scripts/openai_review.py:1633, .claude/commands/ai-review-local.md:8, .claude/commands/ai-review-local.md:26
One P3 remains: there is still no executable regression test for --help / --timeout, so this exact help-text drift could recur silently. .claude/scripts/openai_review.py:1264, .claude/scripts/openai_review.py:1344, tests/test_openai_review.py:1523

Methodology

Severity —. Impact: No findings. Affected methods: none. The patch does not touch any estimator, weighting, variance/SE, identification assumption, or estimator default behavior. Concrete fix: None.

Code Quality

Severity —. Impact: No findings. The previous truncated-response success path is closed by the explicit incomplete-response exit in .claude/scripts/openai_review.py:1212, with coverage added in tests/test_openai_review.py:1713. Concrete fix: None. citeturn1search7

Performance

Severity —. Impact: No findings in the changed lines. The new timeout plumbing and reasoning-model token cap are acceptable tool-level implementation choices. Concrete fix: None.

Maintainability

Severity —. Impact: No findings. Extracting _extract_response_text() keeps Responses parsing centralized instead of duplicating response-shape handling inline. .claude/scripts/openai_review.py:1114, tests/test_openai_review.py:1575. Concrete fix: None.

Tech Debt

Severity —. Impact: No findings. I did not find a matching deferred item in TODO.md that changes the review outcome for this PR. Concrete fix: None.

Security

Severity —. Impact: No findings. The patch keeps API-key handling env-based and does not broaden filesystem or network scope beyond the existing single OpenAI request path. .claude/scripts/openai_review.py:1156, .claude/scripts/openai_review.py:1385. Concrete fix: None.

Documentation/Tests

Severity P3. Impact: The source/help text is corrected, but the changed test block still does not exercise main() / --help, so the exact stale-help regression from the earlier informational review can reappear without a failing test. .claude/scripts/openai_review.py:1264, .claude/scripts/openai_review.py:1344, tests/test_openai_review.py:1523. Concrete fix: add a small CLI smoke test that invokes main() with --help (or runs python3 .claude/scripts/openai_review.py --help) and asserts both Responses API and --timeout appear in the output.

igerber and others added 4 commits March 23, 2026 17:30

Add regression test for non-completed status with valid content

a832404

Pin the content-first parsing behavior: status='incomplete' with usable output should return content, not exit. Addresses P2 from gpt-5.4-pro review. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

igerber merged commit c9914bc into main Mar 23, 2026
13 checks passed

igerber deleted the ai-local-response branch March 23, 2026 23:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate AI local review to OpenAI Responses API#234

Migrate AI local review to OpenAI Responses API#234
igerber merged 5 commits intomainfrom
ai-local-response

igerber commented Mar 23, 2026

Uh oh!

github-actions bot commented Mar 23, 2026

Uh oh!

igerber commented Mar 23, 2026

Uh oh!

github-actions bot commented Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented Mar 23, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions bot commented Mar 23, 2026

Uh oh!

igerber commented Mar 23, 2026

Uh oh!

github-actions bot commented Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant