Skip to content

Migrate AI local review to OpenAI Responses API#234

Merged
igerber merged 5 commits intomainfrom
ai-local-response
Mar 23, 2026
Merged

Migrate AI local review to OpenAI Responses API#234
igerber merged 5 commits intomainfrom
ai-local-response

Conversation

@igerber
Copy link
Owner

@igerber igerber commented Mar 23, 2026

Summary

  • Migrate /ai-review-local script from Chat Completions API to Responses API, enabling gpt-5.4-pro and future reasoning models
  • Add _is_reasoning_model() helper, REASONING_MAX_TOKENS, gpt-5.4-pro pricing, --timeout CLI flag
  • Extract _extract_response_text() helper with content-first success detection (valid content accepted regardless of API status field)
  • Add reasoning token reporting in completion summary and timeout guidance for reasoning models
  • Update skill definition with --timeout docs, background execution for reasoning models, and deep review examples

Methodology references (required if estimator / math changes)

  • N/A — no methodology changes. All changes are to review tooling (.claude/scripts/, .claude/commands/) and tests.

Validation

  • Tests added/updated: tests/test_openai_review.py — 36 new tests across 6 classes:
    • TestIsReasoningModel (10 tests)
    • TestProModelPricing (2 tests)
    • TestExtractResponseText (5 tests)
    • TestResponsesAPIConstants (2 tests)
    • TestCallOpenAIPayload (11 tests — payload construction, response parsing, error paths, variant shapes)
  • Live smoke tests: verified with both gpt-5.4 and gpt-5.4-pro models
  • All 151 tests passing

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

igerber and others added 4 commits March 23, 2026 17:30
Enable gpt-5.4-pro and future reasoning models by switching the OpenAI
endpoint from /v1/chat/completions to /v1/responses. Add _is_reasoning_model()
helper, REASONING_MAX_TOKENS, --timeout CLI flag, gpt-5.4-pro pricing,
conditional temperature/max-tokens, reasoning token reporting, and mocked
unit tests for payload construction.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
The output_text convenience field is null in raw HTTP responses (only
populated by the Python SDK). Extract text from output[].content[] items
instead. Update mock test responses to match real API format.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…ests

P1: Relax status check — extract content first via _extract_response_text(),
only fail when no usable content is found (not on status mismatch alone).
P2: Extract response parsing into dedicated helper with unit tests covering
multiple payload shapes (missing status, null status, SDK output_text,
multiple output blocks). Add reasoning-model timeout hint to stderr.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Pin the content-first parsing behavior: status='incomplete' with usable
output should return content, not exit. Addresses P2 from gpt-5.4-pro review.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@github-actions
Copy link

Overall Assessment

⚠️ Needs changes

Executive Summary

  • No methodology-registered estimator logic is touched here; the diff is confined to local review tooling and tests.
  • P1: the new Responses API path treats status="incomplete" output as a successful review whenever any text is present, so truncated reviews can be saved and reported as complete.
  • P2: the new gpt-5.4-pro pricing constant materially underquotes the tool’s cost estimate.
  • P3: the script help text still says “Chat Completions API,” so the migration left one user-facing doc path stale.
  • Verification was limited: python3 .claude/scripts/openai_review.py --help ran successfully, but pytest is not installed in this environment.

Methodology

  • No findings. Affected methods: none. I checked docs/methodology/REGISTRY.md; this PR only changes .claude/ review tooling and tests/test_openai_review.py, not any estimator, weighting, variance/SE, identification, or default estimator behavior.

Code Quality

  • Severity P1. Impact: call_openai() now accepts incomplete Responses API results as success if any text is present, and the new test suite explicitly locks that behavior in. On token-limit truncation, the tool will write a partial review to disk and print “AI Review complete,” which can silently suppress findings on large reviews. Concrete fix: require result["status"] == "completed" before treating the response as a successful review; if status is incomplete, surface incomplete_details, optionally persist the text as clearly labeled partial output, and exit non-zero or emit a prominent incomplete warning. Update the new success test accordingly. Location: .claude/scripts/openai_review.py:L1210-L1228, tests/test_openai_review.py:L1713-L1725. (platform.openai.com)
  • Severity P2. Impact: the added gpt-5.4-pro pricing entry underestimates user-facing cost estimates by a large margin. The code now hard-codes $15/$60, but OpenAI’s current official pricing for gpt-5.4-pro is $30/M input and $180/M output. Concrete fix: update PRICING["gpt-5.4-pro"] and the paired tests to the official rates. Location: .claude/scripts/openai_review.py:L852-L862, tests/test_openai_review.py:L1559-L1572. (openai.com)

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No separate tech-debt finding. The issues above are not already tracked in TODO.md.

Security

  • No findings.

Documentation/Tests

  • Severity P3. Impact: the migrated script still advertises “OpenAI Chat Completions API” in CLI help, so --help now contradicts both the endpoint constant and the updated command markdown. Concrete fix: update the argparse description string and add a small help-output assertion so future API migrations cannot drift silently. Location: .claude/scripts/openai_review.py:L1245-L1248.

Path to Approval

  1. Change call_openai() so non-completed Responses API statuses, especially incomplete with incomplete_details, are not treated as a normal successful review, and update tests/test_openai_review.py to assert that behavior.

P1: Treat status='incomplete' as a hard error even when content exists,
since truncated reviews may silently suppress findings. Print
incomplete_details and suggest remediation (reduce diff, lower context).
P2: Fix gpt-5.4-pro pricing from (15,60) to (30,180) per official rates.
P3: Fix argparse description to say "Responses API" not "Chat Completions".

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@igerber
Copy link
Owner Author

igerber commented Mar 23, 2026

/ai-review

@github-actions
Copy link

🔁 AI review rerun (requested by @igerber)

Head SHA: 5af76f194818b0308614061ccd1f112a9989b0bf


Overall Assessment

✅ Looks good

Executive Summary

Methodology

  • Severity . Impact: No findings. Affected methods: none. The patch does not touch any estimator, weighting, variance/SE, identification assumption, or estimator default behavior. Concrete fix: None.

Code Quality

Performance

  • Severity . Impact: No findings in the changed lines. The new timeout plumbing and reasoning-model token cap are acceptable tool-level implementation choices. Concrete fix: None.

Maintainability

Tech Debt

  • Severity . Impact: No findings. I did not find a matching deferred item in TODO.md that changes the review outcome for this PR. Concrete fix: None.

Security

Documentation/Tests

  • Severity P3. Impact: The source/help text is corrected, but the changed test block still does not exercise main() / --help, so the exact stale-help regression from the earlier informational review can reappear without a failing test. .claude/scripts/openai_review.py:1264, .claude/scripts/openai_review.py:1344, tests/test_openai_review.py:1523. Concrete fix: add a small CLI smoke test that invokes main() with --help (or runs python3 .claude/scripts/openai_review.py --help) and asserts both Responses API and --timeout appear in the output.

@igerber igerber merged commit c9914bc into main Mar 23, 2026
13 checks passed
@igerber igerber deleted the ai-local-response branch March 23, 2026 23:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant