fix(routing): bound tokenizer json scanner selection#1638
Conversation
Performance BenchmarksCompared Top regressions:
|
…t06-tokenizer-json-routing-20260610
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7c18dd4561
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b9659fd854
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Additional pinned QA: |
|
Additional tokenizer boundary QA: |
|
Pinned QA input: |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3c1e5bb294
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Cross-PR validated QA: current main This PR independently overlaps the ordinary-JSON routing fix, but #1609 is the primary remediation and has already passed the full fixture matrix on its current branch. Please ensure this branch either remains compatible with #1609 or narrows duplicate behavior so merging both cannot reintroduce ambiguous ownership. Preserve benign generic JSON, tokenizer JSON, and manifest controls. Full coordinator report: @codex compare the current branch against #1609’s ownership changes, add only non-duplicative coverage needed here, rebase after the primary fix lands if necessary, and request exact-head review. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3c1e5bb294
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Additional pinned tokenizer JSON QA: |
|
Pinned tokenizer QA completed on head 74f6c3b. Exact benign files validated with cache disabled:
Exact-path malicious controls were also exercised: Jinja template evidence reaches |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 74f6c3b0e5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Summary
Testing
|
…er-json-routing-20260610' into mdangelo/codex/takeover-pr1638-20260611 # Conflicts: # modelaudit/utils/file/detection.py
|
@codex add tokenizer routing QA for |
|
@codex add |
|
@codex add |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b37b1fe12f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…er-json-routing-20260610' into mdangelo/codex/takeover-pr1638-20260611 # Conflicts: # modelaudit/scanners/jax_checkpoint_scanner.py # modelaudit/utils/file/detection.py
|
Summary
Testing
|
|
Updated head: Addressed the latest Codex routing thread:
Validation on this head:
Review threads: 22/22 resolved. |
…er-json-routing-20260610' into mdangelo/codex/takeover-pr1638-20260611
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 38859e03cb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Updated head: Addressed the latest Codex routing thread:
Validation on this head:
Review threads: 23/23 resolved. |
…er-json-routing-20260610' into mdangelo/codex/takeover-pr1638-20260611
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e2b23c57af
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Updated head: Addressed the latest Codex routing threads:
Validation on this head:
Review threads: 25/25 resolved. |
|
Codex Review: Didn't find any major issues. Keep them coming! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
…er-json-routing-20260610' into mdangelo/codex/takeover-pr1638-20260611
|
Codex Review: Didn't find any major issues. Delightful! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
Codex Review: Didn't find any major issues. Breezy! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
Independent review of exact head
Validation on the stable exact head: routing/filetype tests had 65 passes, core tests 52 passes, registry/CLI/streaming/archive tests 11 passes, and the Jinja/JAX/XGBoost overlap slice had 654 passes with one skip. Generated Holo-shaped 13 MiB and Qwen-shaped 18 MiB tokenizers stayed benign; late Jinja and MXNet evidence failed closed; a nested ZIP tokenizer Jinja payload was detected. The exact pinned external tokenizer bytes were unavailable, so generated same-shape corpora were used. At the review snapshot all 25 threads were resolved and most CI was green, with four Python/Windows lanes still running. These two false negatives block merge even if CI completes successfully. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7f749d3d3e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
# Conflicts: # modelaudit/scanners/xgboost_scanner.py
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 599ebba8a4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
# Conflicts: # tests/test_cli.py
Summary
Fixes tokenizer JSON routing false positives by adding a bounded Hugging Face
tokenizer.jsonownership check before generic JSON model scanners claim the file.Root cause:
.jsonrouting could send large tokenizer JSON files through MXNet bounded symbol routing, producingmxnet_symbol_routing_inconclusiveon benign tokenizer files.tokenizer.jsonvia generic JSON/name gates unless content proved otherwise.tokenizer.json/tokenizer_config.jsonas exact Manifest filenames even thoughManifestScanner.can_handle()rejects tokenizer filenames.Fix:
is_huggingface_tokenizer_json_file()with a deterministic 16 MiB read cap.tokenizer.jsononly when the complete root JSON fits inside the cap, contains tokenizer schema evidence (version,added_tokens,model.type,model.vocab), and has no root-level Jinja2/MXNet/XGBoost conflict keys.chat_template/template,learner, or MXNetnodes/arg_nodes/heads.Security tradeoff:
Validation
Focused tests:
Affected suite:
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -n auto --maxfail=1 tests/utils/file/test_filetype.py tests/test_core.py tests/scanners/test_scanner_registry.py tests/scanners/test_jinja2_template_scanner.py tests/scanners/test_xgboost_scanner.py tests/scanners/test_mxnet_scanner.py tests/scanners/test_manifest_scanner.py # 1450 passed, 367 skipped, 3 warningsFull pre-commit/CI-style local checks:
Real pinned Hugging Face reproductions:
Outcome: exit 0,
success: true,files_scanned: 1,scanner_names: [],checks_len: 0, one license-info issue only. Grep over checks/issues found nomxnet,xgboost,jinja,manifest, orsymbol routing.Outcome: exit 0,
success: true,files_scanned: 1,scanner_names: [],checks_len: 0, one license-info issue only. Grep over checks/issues found nomxnet,xgboost,jinja,manifest, orsymbol routing.