fix(pytorch): stream bounded analysis of large legacy shards#1655
Conversation
|
@codex review |
Performance BenchmarksCompared
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f41724ecdc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex address that feedback. Fix both current P2s on exact head f41724e: non-seekable oversized legacy PyTorch streams must get the same bounded control/storage-header probe as seekable/file paths without being truncated by the generic cap, and recognized oversized legacy shards must not stream the entire file merely to compute hashes. Add non-seekable, multi-GB sparse-file, timeout/read-budget, malicious-header, and benign legacy regressions; push a new head and report tests. |
|
Addressed the two P2s on head
Added regressions for non-seekable cap bypass, stream budget exhaustion, sparse multi-GB legacy file prefix hashing, benign/malicious legacy controls, malformed storage, and non-legacy S904 retention. @codex review |
|
@codex review the exact current head after the oversized-shard streaming fixes. Recheck non-seekable legacy probes, bounded hashing, memory/disk limits, and both benign and malicious oversized controls. |
|
Additional pinned QA: |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f1547d9f25
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Summary
Committed changes on the current branch:
Testing
|
|
@codex address that feedback on exact head Resolve all four current P2 threads together:
Add focused regressions for each case, including equal-control/different-storage hashes, multi-storage tensors beyond the probe, truncated non-seekable input, and arbitrary non-legacy oversized suffixes. Preserve bounded memory/read behavior and rerun the pinned |
There was a problem hiding this comment.
💡 Codex Review
modelaudit/modelaudit/scanners/pickle_scanner.py
Line 3925 in f1547d9
When a known-size non-seekable legacy stream is truncated before storage_end but the computed storage_end equals the declared size, this storage_only_omitted branch suppresses _add_stream_truncation_check and allows the scan to finish as bounded-success metadata. In that context the stream did not actually yield the bytes needed to prove the storage extent exists, so truncated input can masquerade as a validated legacy shard and should fail closed instead.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Additional large-shard QA for the next head: |
|
Additional bounded-shard QA: |
|
@codex review |
|
Security review completed. No security issues were found in this pull request. ℹ️ About Codex security reviews in GitHubThis is an experimental Codex feature. Security reviews are triggered when:
Once complete, Codex will leave suggestions, or a comment if no findings are found. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 63b338c63d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Additional large-shard QA on current head Commands: uv run hf download google-bert/bert-large-uncased pytorch_model.bin --revision 6da4b6a26a1877e173fca3225479512db81a5e5b --local-dir /tmp/modelaudit-task24-hf/files/google-bert-bert-large-uncased-6da4b6a26a1877e173fca3225479512db81a5e5b --max-workers 1
uv run modelaudit scan --no-cache --format json --output /tmp/modelaudit-task24-hf/postfix-bert-large-uncased-pytorch_model.json /tmp/modelaudit-task24-hf/files/google-bert-bert-large-uncased-6da4b6a26a1877e173fca3225479512db81a5e5b/pytorch_model.bin
uv run hf download FacebookAI/xlm-roberta-large pytorch_model.bin --revision c23d21b0620b635a76227c604d44e43a9f0ee389 --local-dir /tmp/modelaudit-task24-hf/files/facebookai-xlm-roberta-large-c23d21b0620b635a76227c604d44e43a9f0ee389 --max-workers 1
uv run modelaudit scan --no-cache --format json --output /tmp/modelaudit-task24-hf/postfix-xlm-roberta-large-pytorch_model.json /tmp/modelaudit-task24-hf/files/facebookai-xlm-roberta-large-c23d21b0620b635a76227c604d44e43a9f0ee389/pytorch_model.bin
uv run hf download baidu/NAVA Wan2.2-TI2V-5B/models_t5_umt5-xxl-enc-bf16.pth --revision 16c20287c79c0085ebc8213ca7894701948891f0 --local-dir /tmp/modelaudit-task24-hf/files/baidu-nava-16c20287c79c0085ebc8213ca7894701948891f0 --max-workers 1
uv run modelaudit scan --no-cache --format json --output /tmp/modelaudit-task24-hf/postfix-nava-models_t5_umt5_xxl_enc_bf16.json /tmp/modelaudit-task24-hf/files/baidu-nava-16c20287c79c0085ebc8213ca7894701948891f0/Wan2.2-TI2V-5B/models_t5_umt5-xxl-enc-bf16.pthOutcomes:
|
|
@codex review |
mldangelo-oai
left a comment
There was a problem hiding this comment.
Submitted review-thread replies for the bounded PyTorch ZIP hashing fix.
|
@codex add a pinned large-PyTorch coverage case: |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4907562e3b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Additional pinned large-PyTorch coverage QA for Command: HF_HUB_DISABLE_TELEMETRY=1 NO_ANALYTICS=1 DO_NOT_TRACK=1 uv run hf download facebook/w2v-bert-2.0 conformer_shaw.pt --revision da985ba0987f70aaeb84a80f2851cfac8c697a7b --local-dir /tmp/modelaudit-task24-hf/files/facebook-w2v-bert-2.0-da985ba0987f70aaeb84a80f2851cfac8c697a7b --max-workers 1
PROMPTFOO_DISABLE_TELEMETRY=1 HF_HUB_DISABLE_TELEMETRY=1 NO_ANALYTICS=1 DO_NOT_TRACK=1 uv run modelaudit scan --no-cache --format json --output /tmp/modelaudit-task24-hf/postfix-facebook-w2v-bert-2.0-conformer_shaw-final.json /tmp/modelaudit-task24-hf/files/facebook-w2v-bert-2.0-da985ba0987f70aaeb84a80f2851cfac8c697a7b/conformer_shaw.ptOutcome: CLI exit 2 by design; JSON |
|
@codex add this pinned large-PyTorch regression: |
|
Summary
Committed changes:
Testing
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f253a67b54
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Current-head CI on The command emits valid JSON with Run: https://github.com/promptfoo/modelaudit/actions/runs/27343025307 |
|
Independent review of exact head
Generated direct/HF-like/archive controls otherwise preserved malicious evidence and fail-closed behavior. Three current review threads match these findings. Exact-head CI is also red on the separately reported stale JSON/text assertion. These issues block merge. |
|
Takeover update for exact head Additive commits pushed since launch head
Fixed the three fresh Codex P2 threads on
Thread replies were posted and only those fixed threads were resolved. Live review threads now show no unresolved threads. Validation completed with pytest capped at four workers:
Pinned Hunyuan smoke on the final tree:
Earlier full pinned Hunyuan direct-file matrix after the JIT fix had no @codex review |
|
Codex Review: Didn't find any major issues. Keep it up! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
Current-head CI fix pushed at Fixed the stale CLI regression that asserted human text in default/JSON output:
Validation after the additive fix:
No implementation behavior was changed in this commit; it only aligns the test with JSON-vs-text output contracts. @codex review |
|
Codex Review: Didn't find any major issues. You're on a roll. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
Exact-head closeout for
I did not enable squash auto-merge because the human-review gate is not clean yet. |
# Conflicts: # modelaudit/core.py # modelaudit/scanners/pickle_scanner.py
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9bdbd2a30e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2665946b7d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
# Conflicts: # modelaudit/scanners/pytorch_zip_scanner.py
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bf15f1d86c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
@codex review |
|
Codex Review: Didn't find any major issues. Can't wait for the next one! Reviewed commit: ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Summary
.pt/.binfiles on the existing S904 path; preserve fail-closed outcomes for malformed/truncated coveragesha256_prefixand suppress incomplete aggregatecontent_hashRoot cause
Large pinned Mistral and Mixtral shards are ZIP-backed PyTorch archives. The central-directory preflight rejected valid PyTorch local headers because it only tried 64-bit data descriptors when the local ZIP64 extra field had at least 16 bytes; the pinned shards use zero local sizes, an 8-byte ZIP64 extra field, and a 64-bit descriptor. That forced generic ZIP handling, which stopped at the default 512 MiB read cap before PyTorch ZIP bounded metadata scanning could run. Raw legacy PyTorch pickle streams had a similar early
PickleScannersize/hash path before the bounded legacy control-stream parser could classify storage safely.Security tradeoffs
_legacy_pytorch_layout_for_scan()recognizes the bounded legacy control layout.Legacy PyTorch Bounded Analysiswith byte/opcode/key/time limits and also records failedLegacy PyTorch Storage Payload Coveragewithscan_outcome_reason=legacy_pytorch_storage_payload_skipped.legacy_pytorch_storage_layout_incomplete, but core still avoids pre-dispatch full-file hashing so the bounded/inconclusive scanner path is not defeated.max_file_read_size; malformed magic/protocol-only streams return the deferred S904 after the bounded preamble probe.sha256_prefix,hash_complete=false) instead of publishing a complete-looking full-filesha256. Incomplete top-level aggregate hashes are omitted (content_hash=null).REDUCE/global/persistent-ID payloads remain detected.Real-model QA
Environment used:
PROMPTFOO_DISABLE_TELEMETRY=1 HF_HUB_DISABLE_TELEMETRY=1 NO_ANALYTICS=1 DO_NOT_TRACK=1, scratch cache under/tmp/modelaudit-task24-hf, no HF token.Pre-fix reproduction on exact starting main
2f782ba1f18ab5aac4716c8c7ac6f0a16f0c6f60:Outcome: exit 2;
scan_outcome_reasons=["max_file_read_size_exceeded"]; S904File too large: 5064823659 bytes (max: 536870912).Post-fix pinned Mistral full layout on head
2ba973614df7dece1e8deee79586f83f60e0f82f:Revision:
mistralai/Mistral-7B-v0.1@27d67f1b5f57dc0953326b2601d68371d40ea8da; shards 9,943,028,044 bytes and 5,064,823,659 bytes. Outcome: exit 0;success=true;has_errors=false; scannersmanifest,pytorch_zip;bytes_scanned=132005;content_hash=null; no S904/max-read reason; only informational license warning for the index JSON. Shards recordsha256_prefix, no completesha256, and boundeddata.pklcoverage.Post-fix pinned Mixtral selective shard on head
2ba973614df7dece1e8deee79586f83f60e0f82f:Revision:
mistralai/Mixtral-8x7B-Instruct-v0.1@eba92302a2861cdc0098cc54bc9f17cb2c47eb61;consolidated.00.pt12,134,470,792 bytes. Outcome: exit 0;success=true; scannerpytorch_zip;bytes_scanned=13187;content_hash=null;file_hashes.sha256_prefix=0b8d5100...; no archive-levelsha256;pickle_files=["consolidated.00/data.pkl"]; no S904/max-read reason.Post-fix raw legacy PyTorch control on head
2ba973614df7dece1e8deee79586f83f60e0f82f:Outcome: exit 1 due existing S212 persistent-ID warning; scanner
pickle;bytes_scanned=89690;content_hash=null;file_hashes.sha256_prefix=bea17426...; no completesha256; no S904/max-read reason; file metadata hasscan_outcome=inconclusive,scan_outcome_reasons=["legacy_pytorch_storage_payload_skipped"],legacy_pytorch_storage_payload_incomplete=true,legacy_pytorch_storage_start=89690,legacy_pytorch_storage_end=1344997306.Additional requested real shard QA on head
4907562e3be10bab0e48ce76d7022c86be976b78remains valid for the ZIP path:facebook/w2v-bert-2.0@da985ba0987f70aaeb84a80f2851cfac8c697a7bconformer_shaw.pt2,329,131,983 bytes exited 2 with explicitjit_embedded_python_snippet_limit,bytes_scanned=295509,content_hash=null,sha256_prefix=d5225..., no S904.Tests
Head validated locally:
2ba973614df7dece1e8deee79586f83f60e0f82f.Results: focused post-fetch bundle 4 passed; pickle/hash/model suite 291 passed; broad affected sweep 2,069 passed / 5 skipped / 6 warnings in 455.50s; Ruff, format check, full mypy, and diff check clean. Optional-dependency skips were expected for missing
torchandonnxin this environment. Before final validation, fetchedorigin/main; it is already an ancestor of the branch, so no additive merge commit was needed.Residual risk
No model weights, HF tokens, caches, or generated scan payloads are committed. Oversized PyTorch ZIP/raw legacy artifact identity is represented by bounded prefix hash evidence, not a complete content hash; consumers that require complete full-file hashes should treat
content_hash=nullplushash_complete=falseorsha256_prefixas intentionally incomplete identity coverage.