Skip to content

fix: handle nested JSON in parse_judge_output via raw_decode#875

Open
sjoerdvink99 wants to merge 1 commit intogenerative-computing:mainfrom
sjoerdvink99:fix/865-nested-json-judge-output
Open

fix: handle nested JSON in parse_judge_output via raw_decode#875
sjoerdvink99 wants to merge 1 commit intogenerative-computing:mainfrom
sjoerdvink99:fix/865-nested-json-judge-output

Conversation

@sjoerdvink99
Copy link
Copy Markdown

@sjoerdvink99 sjoerdvink99 commented Apr 16, 2026

Misc PR

Type of PR

  • Bug Fix
  • New Feature
  • Documentation
  • Other

Description

  • Link to Issue: Fixes

Fixes #865

Previously, parse_judge_output returned (score, None) when the judge output contained a JSON object with a score key but no justification key (or an explicit null justification). It now returns (score, judge_output) falling back to the full raw output string as the justification. This avoids silently discarding context when the judge omits a justification field.

Testing

  • Tests added to the respective file if code was changed
  • New code has 100% coverage if code as added
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Attribution

  • AI coding assistants used

@sjoerdvink99 sjoerdvink99 requested a review from a team as a code owner April 16, 2026 15:23
@github-actions github-actions bot added the bug Something isn't working label Apr 16, 2026
@github-actions
Copy link
Copy Markdown
Contributor

The PR description has been updated. Please fill out the template for your PR to be reviewed.

Copy link
Copy Markdown
Member

@psschwei psschwei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! A few comments below.

Comment thread cli/eval/runner.py Outdated
if data is not None:
score = data.get("score")
justification = data.get("justification")
return score, justification if isinstance(justification, str) else judge_output
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
return score, justification if isinstance(justification, str) else judge_output
return score, (justification if isinstance(justification, str) else judge_output)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing to note: if there is no justification, this will now return (score, judge_output) whereas before it would've returned (score, None). Probably should call that out explicitly in the PR description.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed @psschwei ! Thanks for the feedback!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be worth adding some additional test cases:

  • JSON with score but no justification key
  • "justification": null (explicit null)
  • Multiple JSON objects where the first lacks score
  • Direct unit tests for _extract_first_json

@sjoerdvink99 sjoerdvink99 force-pushed the fix/865-nested-json-judge-output branch from 2aa1be7 to 9643264 Compare April 17, 2026 13:06
@psschwei
Copy link
Copy Markdown
Member

There's a ruff check that's failing @sjoerdvink99

Signed-off-by: sjoerdvink99 <sjoerdvink@icloud.com>
@sjoerdvink99 sjoerdvink99 force-pushed the fix/865-nested-json-judge-output branch from 9643264 to 79f3a8a Compare April 18, 2026 21:32
@sjoerdvink99
Copy link
Copy Markdown
Author

Fixed now @psschwei !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: nested JSON in judge output silently discards justification in parse_judge_output

2 participants