Skip to content

fix(pytorch): stream bounded analysis of large legacy shards#1655

Merged
mldangelo-oai merged 28 commits into
mainfrom
mdangelo/codex/hf-fp-t24-large-pytorch-streaming-20260610
Jun 12, 2026
Merged

fix(pytorch): stream bounded analysis of large legacy shards#1655
mldangelo-oai merged 28 commits into
mainfrom
mdangelo/codex/hf-fp-t24-large-pytorch-streaming-20260610

Conversation

@mldangelo-oai

@mldangelo-oai mldangelo-oai commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary

  • accept PyTorch-style streamed ZIP members whose local header carries zero sizes plus a partial ZIP64 extra field and a 64-bit data descriptor
  • defer generic pickle read-size failure only after bounded bytes prove a legacy PyTorch control layout, then skip tensor storage with explicit inconclusive coverage metadata
  • keep non-legacy oversized files and preamble-only deceptive .pt/.bin files on the existing S904 path; preserve fail-closed outcomes for malformed/truncated coverage
  • avoid full-file archive hashing for oversized recognized PyTorch ZIP and raw legacy streams, including incomplete-but-recognized legacy layouts; record bounded sha256_prefix and suppress incomplete aggregate content_hash

Root cause

Large pinned Mistral and Mixtral shards are ZIP-backed PyTorch archives. The central-directory preflight rejected valid PyTorch local headers because it only tried 64-bit data descriptors when the local ZIP64 extra field had at least 16 bytes; the pinned shards use zero local sizes, an 8-byte ZIP64 extra field, and a 64-bit descriptor. That forced generic ZIP handling, which stopped at the default 512 MiB read cap before PyTorch ZIP bounded metadata scanning could run. Raw legacy PyTorch pickle streams had a similar early PickleScanner size/hash path before the bounded legacy control-stream parser could classify storage safely.

Security tradeoffs

  • ZIP preflight still requires descriptor CRC and compressed/uncompressed sizes to match the central directory; it now tries the 64-bit descriptor width whenever a local ZIP64 field is present.
  • Raw legacy pickle deferral is suffix-scoped to PyTorch containers, but suffix or magic/protocol alone is insufficient: S904 and full-file aggregate hashing are suppressed only after _legacy_pytorch_layout_for_scan() recognizes the bounded legacy control layout.
  • Valid legacy tensor storage is not materialized. The scanner records Legacy PyTorch Bounded Analysis with byte/opcode/key/time limits and also records failed Legacy PyTorch Storage Payload Coverage with scan_outcome_reason=legacy_pytorch_storage_payload_skipped.
  • Incomplete recognized legacy storage layouts fail closed with legacy_pytorch_storage_layout_incomplete, but core still avoids pre-dispatch full-file hashing so the bounded/inconclusive scanner path is not defeated.
  • Oversized non-seekable streams must prove the legacy layout before bypassing max_file_read_size; malformed magic/protocol-only streams return the deferred S904 after the bounded preamble probe.
  • Oversized recognized file/stream integrity uses bounded prefix hashing (sha256_prefix, hash_complete=false) instead of publishing a complete-looking full-file sha256. Incomplete top-level aggregate hashes are omitted (content_hash=null).
  • Truncated storage/control layouts remain inconclusive with S902 coverage checks. Malicious control pickle REDUCE/global/persistent-ID payloads remain detected.

Real-model QA

Environment used: PROMPTFOO_DISABLE_TELEMETRY=1 HF_HUB_DISABLE_TELEMETRY=1 NO_ANALYTICS=1 DO_NOT_TRACK=1, scratch cache under /tmp/modelaudit-task24-hf, no HF token.

Pre-fix reproduction on exact starting main 2f782ba1f18ab5aac4716c8c7ac6f0a16f0c6f60:

uv run modelaudit scan --no-cache --format json --output /tmp/modelaudit-task24-hf/prefix-main-repro-mistral-bin.json /tmp/modelaudit-task24-hf/files/mistral-7b-v0.1-27d67f1b5f57dc0953326b2601d68371d40ea8da/pytorch_model-00002-of-00002.bin

Outcome: exit 2; scan_outcome_reasons=["max_file_read_size_exceeded"]; S904 File too large: 5064823659 bytes (max: 536870912).

Post-fix pinned Mistral full layout on head 2ba973614df7dece1e8deee79586f83f60e0f82f:

PROMPTFOO_DISABLE_TELEMETRY=1 HF_HUB_DISABLE_TELEMETRY=1 NO_ANALYTICS=1 DO_NOT_TRACK=1 uv run modelaudit scan --no-cache --format json --output /tmp/modelaudit-task24-hf/postfix-mistral-full-legacy-dir-final5.json /tmp/modelaudit-task24-hf/files/mistral-7b-v0.1-27d67f1b5f57dc0953326b2601d68371d40ea8da

Revision: mistralai/Mistral-7B-v0.1@27d67f1b5f57dc0953326b2601d68371d40ea8da; shards 9,943,028,044 bytes and 5,064,823,659 bytes. Outcome: exit 0; success=true; has_errors=false; scanners manifest,pytorch_zip; bytes_scanned=132005; content_hash=null; no S904/max-read reason; only informational license warning for the index JSON. Shards record sha256_prefix, no complete sha256, and bounded data.pkl coverage.

Post-fix pinned Mixtral selective shard on head 2ba973614df7dece1e8deee79586f83f60e0f82f:

PROMPTFOO_DISABLE_TELEMETRY=1 HF_HUB_DISABLE_TELEMETRY=1 NO_ANALYTICS=1 DO_NOT_TRACK=1 uv run modelaudit scan --no-cache --format json --output /tmp/modelaudit-task24-hf/postfix-mixtral-consolidated-00-final5.json /tmp/modelaudit-task24-hf/files/mixtral-8x7b-instruct-v0.1-eba92302a2861cdc0098cc54bc9f17cb2c47eb61/consolidated.00.pt

Revision: mistralai/Mixtral-8x7B-Instruct-v0.1@eba92302a2861cdc0098cc54bc9f17cb2c47eb61; consolidated.00.pt 12,134,470,792 bytes. Outcome: exit 0; success=true; scanner pytorch_zip; bytes_scanned=13187; content_hash=null; file_hashes.sha256_prefix=0b8d5100...; no archive-level sha256; pickle_files=["consolidated.00/data.pkl"]; no S904/max-read reason.

Post-fix raw legacy PyTorch control on head 2ba973614df7dece1e8deee79586f83f60e0f82f:

PROMPTFOO_DISABLE_TELEMETRY=1 HF_HUB_DISABLE_TELEMETRY=1 NO_ANALYTICS=1 DO_NOT_TRACK=1 uv run modelaudit scan --no-cache --format json --output /tmp/modelaudit-task24-hf/postfix-google-bert-large-raw-legacy-final5.json /tmp/modelaudit-task24-hf/files/google-bert-bert-large-uncased-6da4b6a26a1877e173fca3225479512db81a5e5b/pytorch_model.bin

Outcome: exit 1 due existing S212 persistent-ID warning; scanner pickle; bytes_scanned=89690; content_hash=null; file_hashes.sha256_prefix=bea17426...; no complete sha256; no S904/max-read reason; file metadata has scan_outcome=inconclusive, scan_outcome_reasons=["legacy_pytorch_storage_payload_skipped"], legacy_pytorch_storage_payload_incomplete=true, legacy_pytorch_storage_start=89690, legacy_pytorch_storage_end=1344997306.

Additional requested real shard QA on head 4907562e3be10bab0e48ce76d7022c86be976b78 remains valid for the ZIP path: facebook/w2v-bert-2.0@da985ba0987f70aaeb84a80f2851cfac8c697a7b conformer_shaw.pt 2,329,131,983 bytes exited 2 with explicit jit_embedded_python_snippet_limit, bytes_scanned=295509, content_hash=null, sha256_prefix=d5225..., no S904.

Tests

Head validated locally: 2ba973614df7dece1e8deee79586f83f60e0f82f.

PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_pickle_scanner.py::test_large_non_seekable_legacy_preamble_without_layout_keeps_file_size_limit tests/scanners/test_pickle_scanner.py::test_large_non_seekable_legacy_pytorch_stream_uses_stream_budget_not_file_cap tests/scanners/test_pickle_scanner.py::test_large_non_seekable_legacy_pytorch_stream_budget_exhaustion_is_inconclusive tests/test_models.py::TestFileMetadataModel::test_set_file_hashes_preserves_sha256_prefix
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_pickle_scanner.py tests/test_regular_scan_hash.py tests/test_models.py
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_pickle_scanner.py tests/scanners/test_zip_scanner.py tests/scanners/test_pytorch_zip_scanner.py tests/test_pytorch_zip_detection.py tests/utils/file/test_streaming_analysis.py tests/test_regular_scan_hash.py tests/test_models.py
uv run ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
git diff --check

Results: focused post-fetch bundle 4 passed; pickle/hash/model suite 291 passed; broad affected sweep 2,069 passed / 5 skipped / 6 warnings in 455.50s; Ruff, format check, full mypy, and diff check clean. Optional-dependency skips were expected for missing torch and onnx in this environment. Before final validation, fetched origin/main; it is already an ancestor of the branch, so no additive merge commit was needed.

Residual risk

No model weights, HF tokens, caches, or generated scan payloads are committed. Oversized PyTorch ZIP/raw legacy artifact identity is represented by bounded prefix hash evidence, not a complete content hash; consumers that require complete full-file hashes should treat content_hash=null plus hash_complete=false or sha256_prefix as intentionally incomplete identity coverage.

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex review

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Workflow run and artifacts

Performance Benchmarks

Compared 12 shared benchmarks with a regression threshold of 15%.
Status: 0 regressions, 0 improved, 12 stable, 0 new, 0 missing.
Aggregate shared-benchmark median: 1.470s -> 1.444s (-1.8%).

Workload Benchmark Target Size Files Baseline Current Change Status
warm-cache-rescan tests/benchmarks/test_scan_benchmarks.py::test_scan_warm_cached_repository_rescan release-candidate 547.3 KiB 32 126.32ms 115.78ms -8.3% stable
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_hex] nested_hex 130 B 1 608.0us 629.5us +3.5% stable
suspicious-pickle-intake tests/benchmarks/test_scan_benchmarks.py::test_scan_suspicious_pickle_intake suspicious-intake 183.8 KiB 4 147.90ms 143.89ms -2.7% stable
direct-malicious-upload tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_direct_malicious_upload malicious_reduce 52 B 1 518.5us 529.0us +2.0% stable
duplicate-heavy-registry tests/benchmarks/test_scan_benchmarks.py::test_scan_duplicate_registry_snapshot registry-snapshot 915.2 KiB 13 404.61ms 396.94ms -1.9% stable
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_raw] nested_raw 78 B 1 571.3us 578.4us +1.2% stable
single-checkpoint-preflight tests/benchmarks/test_scan_benchmarks.py::test_scan_single_checkpoint_before_load single_checkpoint.pkl 183.0 KiB 1 72.36ms 71.47ms -1.2% stable
mixed-model-repository tests/benchmarks/test_scan_benchmarks.py::test_scan_release_candidate_repository release-candidate 547.3 KiB 32 493.46ms 487.98ms -1.1% stable
clean-training-checkpoint tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_clean_training_checkpoint safe_large 278.2 KiB 1 109.75ms 110.66ms +0.8% stable
chunked-upload-stream tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_chunked_upload_stream chunked_stream 278.2 KiB 1 113.16ms 113.90ms +0.7% stable
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_base64] nested_base64 98 B 1 574.7us 577.7us +0.5% stable
padded-multi-stream-upload tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_padded_multi_stream_upload multi_stream_padded 4.1 KiB 1 633.8us 631.9us -0.3% stable

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f41724ecdc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/scanners/pickle_scanner.py
Comment thread modelaudit/scanners/pickle_scanner.py Outdated
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex address that feedback. Fix both current P2s on exact head f41724e: non-seekable oversized legacy PyTorch streams must get the same bounded control/storage-header probe as seekable/file paths without being truncated by the generic cap, and recognized oversized legacy shards must not stream the entire file merely to compute hashes. Add non-seekable, multi-GB sparse-file, timeout/read-budget, malicious-header, and benign legacy regressions; push a new head and report tests.

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Addressed the two P2s on head f1547d9f:

  • non-seekable oversized legacy PyTorch streams now ignore the generic file-read cap for the bounded control/storage probe while still honoring max_known_stream_read_bytes; budget exhaustion becomes legacy_pytorch_control_layout_incomplete.
  • recognized oversized legacy PyTorch files now record a bounded prefix integrity hash (sha256_prefix, hash_complete=false) instead of streaming the full shard for scanner-level hashing.

Added regressions for non-seekable cap bypass, stream budget exhaustion, sparse multi-GB legacy file prefix hashing, benign/malicious legacy controls, malformed storage, and non-legacy S904 retention.

@codex review

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex review the exact current head after the oversized-shard streaming fixes. Recheck non-seekable legacy probes, bounded hashing, memory/disk limits, and both benign and malicious oversized controls.

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Additional pinned QA: FacebookAI/xlm-roberta-large@c23d21b0620b635a76227c604d44e43a9f0ee389 has a 2.244 GB pytorch_model.bin; origin/main hashes it but reports S904/max_file_read_size_exceeded instead of completing metadata inspection. Please include this exact artifact in the PR head validation and record peak memory, bytes read, terminal outcome, and malicious oversized controls.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f1547d9f25

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/scanners/pickle_scanner.py
Comment thread modelaudit/scanners/pickle_scanner.py Outdated
Comment thread modelaudit/scanners/pickle_scanner.py
Comment thread modelaudit/scanners/pickle_scanner.py
@chatgpt-codex-connector

Copy link
Copy Markdown

Summary

Committed changes on the current branch:

  • 161625f7e8f854d88b95c51ca84b91914107e77ffix: bound oversized legacy PyTorch streams

Testing

  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_pickle_scanner.py::test_large_non_seekable_legacy_pytorch_stream_defers_file_size_limit tests/scanners/test_pickle_scanner.py::test_large_non_seekable_legacy_pytorch_malicious_control_still_fails tests/scanners/test_pickle_scanner.py::test_large_non_seekable_legacy_pytorch_header_read_budget_fails_closed tests/scanners/test_pickle_scanner.py::test_large_non_seekable_non_legacy_pytorch_suffix_keeps_file_size_limit tests/scanners/test_pickle_scanner.py::test_large_sparse_legacy_pytorch_shard_uses_bounded_hash tests/scanners/test_pickle_scanner.py::test_large_legacy_pytorch_container_defers_file_size_limit tests/scanners/test_pickle_scanner.py::test_large_seekable_legacy_pytorch_stream_defers_file_size_limit
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_pickle_scanner.py
  • uv run ruff check modelaudit/scanners/pickle_scanner.py tests/scanners/test_pickle_scanner.py
  • uv run ruff format --check modelaudit/scanners/pickle_scanner.py tests/scanners/test_pickle_scanner.py
  • uv run mypy modelaudit/scanners/pickle_scanner.py tests/scanners/test_pickle_scanner.py
  • git diff --check
  • uv run ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
  • uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
  • uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -n auto -m "not slow and not integration" --maxfail=1 failed on tests/test_cli.py::test_cli_report_writers_reject_symlinked_parent_directory with expected exit 2 but observed 0; a targeted rerun of that same test also failed the same way.

View task →

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex address that feedback on exact head f1547d9f25c079b050c0c154712848d5d1bc4122.

Resolve all four current P2 threads together:

  • Record oversized seekable stream integrity as explicit prefix/partial hash metadata, never as a complete sha256.
  • Keep the random-access layout handle open while validating storage records beyond the control probe.
  • Fail closed when a known-size non-seekable legacy stream returns fewer bytes than required to prove storage_end.
  • Do not bypass max_file_read_size based only on a .pt/.bin suffix; establish bounded legacy framing evidence before deferring the size limit.

Add focused regressions for each case, including equal-control/different-storage hashes, multi-storage tensors beyond the probe, truncated non-seekable input, and arbitrary non-legacy oversized suffixes. Preserve bounded memory/read behavior and rerun the pinned FacebookAI/xlm-roberta-large artifact QA. Push a new commit, reply to and resolve every addressed thread, then request a fresh exact-head review.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

if storage_only_omitted:

P2 Badge Fail closed on truncated non-seekable storage

When a known-size non-seekable legacy stream is truncated before storage_end but the computed storage_end equals the declared size, this storage_only_omitted branch suppresses _add_stream_truncation_check and allows the scan to finish as bounded-success metadata. In that context the stream did not actually yield the bytes needed to prove the storage extent exists, so truncated input can masquerade as a validated legacy shard and should fail closed instead.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/scanners/pickle_scanner.py
Comment thread modelaudit/scanners/pickle_scanner.py Outdated
Comment thread modelaudit/scanners/pickle_scanner.py Outdated
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Additional large-shard QA for the next head: baidu/NAVA@16c20287c79c0085ebc8213ca7894701948891f0 contains Wan2.2-TI2V-5B/models_t5_umt5-xxl-enc-bf16.pth at 11,361,920,418 bytes; exact main stops at the generic 512 MiB read limit. Include it with the xlm-roberta and Mistral/Mixtral bounded-read controls after resolving the four current P2s.

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Additional bounded-shard QA: google-bert/bert-large-uncased@6da4b6a26a1877e173fca3225479512db81a5e5b has a 1,344,997,306-byte pytorch_model.bin that exact main stops at generic S904. Add it to the takeover real-model matrix with xlm-roberta, NAVA, Mistral, and Mixtral.

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Security review completed. No security issues were found in this pull request.

View security finding report

ℹ️ About Codex security reviews in GitHub

This is an experimental Codex feature. Security reviews are triggered when:

  • You comment "@codex security review"
  • A regular code review gets triggered (for example, "@codex review" or when a PR is opened), and you’re opted in so security review runs alongside code review

Once complete, Codex will leave suggestions, or a comment if no findings are found.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 63b338c63d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/scanners/pickle_scanner.py
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Additional large-shard QA on current head 63b338c63d90a59f5c12024f848d13da81d9ef02 with telemetry disabled (PROMPTFOO_DISABLE_TELEMETRY=1 HF_HUB_DISABLE_TELEMETRY=1 NO_ANALYTICS=1 DO_NOT_TRACK=1). Artifacts were downloaded under /tmp/modelaudit-task24-hf/files; no tokens, weights, caches, or generated scan payloads are committed.

Commands:

uv run hf download google-bert/bert-large-uncased pytorch_model.bin --revision 6da4b6a26a1877e173fca3225479512db81a5e5b --local-dir /tmp/modelaudit-task24-hf/files/google-bert-bert-large-uncased-6da4b6a26a1877e173fca3225479512db81a5e5b --max-workers 1
uv run modelaudit scan --no-cache --format json --output /tmp/modelaudit-task24-hf/postfix-bert-large-uncased-pytorch_model.json /tmp/modelaudit-task24-hf/files/google-bert-bert-large-uncased-6da4b6a26a1877e173fca3225479512db81a5e5b/pytorch_model.bin

uv run hf download FacebookAI/xlm-roberta-large pytorch_model.bin --revision c23d21b0620b635a76227c604d44e43a9f0ee389 --local-dir /tmp/modelaudit-task24-hf/files/facebookai-xlm-roberta-large-c23d21b0620b635a76227c604d44e43a9f0ee389 --max-workers 1
uv run modelaudit scan --no-cache --format json --output /tmp/modelaudit-task24-hf/postfix-xlm-roberta-large-pytorch_model.json /tmp/modelaudit-task24-hf/files/facebookai-xlm-roberta-large-c23d21b0620b635a76227c604d44e43a9f0ee389/pytorch_model.bin

uv run hf download baidu/NAVA Wan2.2-TI2V-5B/models_t5_umt5-xxl-enc-bf16.pth --revision 16c20287c79c0085ebc8213ca7894701948891f0 --local-dir /tmp/modelaudit-task24-hf/files/baidu-nava-16c20287c79c0085ebc8213ca7894701948891f0 --max-workers 1
uv run modelaudit scan --no-cache --format json --output /tmp/modelaudit-task24-hf/postfix-nava-models_t5_umt5_xxl_enc_bf16.json /tmp/modelaudit-task24-hf/files/baidu-nava-16c20287c79c0085ebc8213ca7894701948891f0/Wan2.2-TI2V-5B/models_t5_umt5-xxl-enc-bf16.pth

Outcomes:

  • google-bert/bert-large-uncased@6da4b6a26a1877e173fca3225479512db81a5e5b, pytorch_model.bin 1,344,997,306 bytes: CLI exit 1 due S212 BINPERSID warning; JSON success=true, scanner pickle, legacy_pytorch_bounded_analysis=true, bytes_scanned=89690, no max_file_read_size_exceeded; peak RSS via resource.getrusage(RUSAGE_CHILDREN).ru_maxrss 112,580 KB.
  • FacebookAI/xlm-roberta-large@c23d21b0620b635a76227c604d44e43a9f0ee389, pytorch_model.bin 2,244,861,551 bytes: CLI exit 1 due S212 BINPERSID warning; JSON success=true, scanner pickle, legacy_pytorch_bounded_analysis=true, bytes_scanned=90055, no max_file_read_size_exceeded; peak RSS 112,180 KB.
  • baidu/NAVA@16c20287c79c0085ebc8213ca7894701948891f0, Wan2.2-TI2V-5B/models_t5_umt5-xxl-enc-bf16.pth 11,361,920,418 bytes: CLI exit 0; JSON success=true, scanner pytorch_zip, pickle_files=["umt5-xxl-enc-bf16/data.pkl"], bytes_scanned=145350, no max_file_read_size_exceeded; peak RSS 87,232 KB.

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex review

@mldangelo-oai mldangelo-oai left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Submitted review-thread replies for the bounded PyTorch ZIP hashing fix.

Comment thread modelaudit/scanners/pickle_scanner.py
Comment thread modelaudit/scanners/pickle_scanner.py
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex add a pinned large-PyTorch coverage case: facebook/w2v-bert-2.0@da985ba0987f70aaeb84a80f2851cfac8c697a7b has a valid 2,329,131,983-byte conformer_shaw.pt that ends inconclusive after the embedded JIT/Python snippet budget. Final QA should truthfully inspect bounded metadata/code evidence without whole-file materialization or silently exhausting the snippet budget.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4907562e3b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/core.py Outdated
Comment thread modelaudit/scanners/pickle_scanner.py
Comment thread modelaudit/core.py Outdated
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Additional pinned large-PyTorch coverage QA for facebook/w2v-bert-2.0@da985ba0987f70aaeb84a80f2851cfac8c697a7b completed on head 4907562e3be10bab0e48ce76d7022c86be976b78 with telemetry disabled and no HF token. No files are committed.

Command:

HF_HUB_DISABLE_TELEMETRY=1 NO_ANALYTICS=1 DO_NOT_TRACK=1 uv run hf download facebook/w2v-bert-2.0 conformer_shaw.pt --revision da985ba0987f70aaeb84a80f2851cfac8c697a7b --local-dir /tmp/modelaudit-task24-hf/files/facebook-w2v-bert-2.0-da985ba0987f70aaeb84a80f2851cfac8c697a7b --max-workers 1
PROMPTFOO_DISABLE_TELEMETRY=1 HF_HUB_DISABLE_TELEMETRY=1 NO_ANALYTICS=1 DO_NOT_TRACK=1 uv run modelaudit scan --no-cache --format json --output /tmp/modelaudit-task24-hf/postfix-facebook-w2v-bert-2.0-conformer_shaw-final.json /tmp/modelaudit-task24-hf/files/facebook-w2v-bert-2.0-da985ba0987f70aaeb84a80f2851cfac8c697a7b/conformer_shaw.pt

Outcome: CLI exit 2 by design; JSON success=false, has_errors=false, scanner pytorch_zip, bytes_scanned=295509, content_hash=null, peak child RSS 87944 KB. File conformer_shaw.pt is 2,329,131,983 bytes and records archive file_hashes.sha256_prefix=d5225fa664bf1bb9660488bfe7ad0f1a59d8252dd778adf4d2f0d57f19690585 with hash_complete=false; no archive-level full sha256 is published. Embedded conformer_shaw/data.pkl coverage is complete (98,487 bytes, 24,174 opcodes, raw_scan_complete=true, opcode_scan_complete=true). The inconclusive result is explicit, not silent: issue/check Embedded Python/JIT analysis incomplete: candidate snippet budget was exceeded, scan_outcome_reasons=["jit_embedded_python_snippet_limit"], candidate_snippets=25, omitted_snippets=15.

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex add this pinned large-PyTorch regression: h94/IP-Adapter-FaceID@43907e6f44d079bf1a9102d9a6e56aef7a219bae has valid 1.49 GB and 1.07 GB .bin PyTorch ZIPs. Their inner data.pkl bytes are fully read, but current main reports raw_scan_complete=false, opcode_scan_complete=false, and pickle_analysis_incomplete because raw secret/JIT detectors are skipped. Final QA should provide truthful bounded completion, or an explicit precise residual reason, without materializing tensor storage.

@chatgpt-codex-connector

Copy link
Copy Markdown

Summary

  • Added pinned facebook/w2v-bert-2.0@da985ba0987f70aaeb84a80f2851cfac8c697a7b / conformer_shaw.pt constants, including the exact 2,329,131,983-byte size used by the regression. tests/scanners/test_pickle_scanner.pyL34-L38
  • Extended the legacy PyTorch test fixture builder so synthetic legacy containers can carry embedded control-stream metadata/code evidence without materializing sparse tensor storage, plus a helper that converges the declared storage size to an exact total file size. tests/scanners/test_pickle_scanner.pyL212-L264
  • Added the pinned w2v-bert regression: it creates a sparse exact-size conformer_shaw.pt, scans bounded control/JIT evidence, verifies fail-closed inconclusive outcome on jit_embedded_python_snippet_limit, avoids max_file_read_size_exceeded, records prefix hashing only, and proves tensor storage was not materialized. tests/scanners/test_pickle_scanner.pyL2025-L2080

Committed changes:

  • bada439test: cover pinned w2v-bert JIT budget case

Testing

  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_pickle_scanner.py::test_pinned_w2v_bert_conformer_shaw_large_legacy_jit_budget_is_inconclusive tests/scanners/test_pickle_scanner.py::test_large_legacy_pytorch_container_defers_file_size_limit tests/scanners/test_pickle_scanner.py::test_large_legacy_pytorch_malicious_control_still_fails tests/scanners/test_pickle_scanner.py::test_sparse_multigib_legacy_pytorch_file_uses_prefix_hash -q
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_pickle_scanner.py -q
  • uv run ruff check --fix modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
  • uv run ruff format modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
  • uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
  • uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
  • git diff --check
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -n auto -m "not slow and not integration" --maxfail=1 — I started the full non-slow suite, observed a failure marker partway through, and stopped the long-running job before completion; focused pickle coverage and static checks passed.

View task →

@mldangelo-oai mldangelo-oai requested a review from mldangelo June 11, 2026 10:09

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f253a67b54

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/scanners/pickle_scanner.py
Comment thread modelaudit/core.py Outdated
Comment thread modelaudit/core.py Outdated
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Current-head CI on 301a4233 fails consistently on Python 3.12, Python 3.13, and Windows in:

tests/test_cli.py::test_scan_inconclusive_without_issues_reports_inconclusive
assert "Inconclusive" in result.output

The command emits valid JSON with success: false, zero issues/checks, and one scanned file; the test is asserting human text against JSON output. Update the JSON-mode regression to parse and assert the structured inconclusive contract, and retain/add a separate text-mode assertion for the word Inconclusive if desired. Do not add prose to JSON output. Rerun the focused CLI test and supported platform lanes, push additively, and request fresh current-head Codex review.

Run: https://github.com/promptfoo/modelaudit/actions/runs/27343025307

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Independent review of exact head 301a42330fb68dc352da9faf4fe53e1998648c76 found three resource/configuration blockers in addition to the already-posted JSON-output CI regression:

  1. P2: non-seekable legacy PyTorch streams can exceed max_file_read_size before layout validation. Once a 128-byte preamble resembles a partial legacy control stream, deferred reading uses ignore_file_read_size=True. With a configured 64-byte cap, a malformed stream read 4,224 bytes before failing closed with S904. It is bounded by max_known_stream_read_bytes, but violates the caller's file cap.
  2. P2: invalid max_file_read_size values are interpreted differently by hash deferral and scanners. Core directly coerces int(... or 0), while scanners use their validated fallback. With scanner default 1,024, None, "bad", and -7 disable deferral while the scanner still limits at 1,024; True becomes a helper limit of 1 and forces deferral for a 384-byte file. Normalize once through the same validation path.
  3. P2: regular scan cache still full-hashes large PyTorch ZIPs before bounded dispatch. The aggregate deferral path reaches cache-decorated scan_file; cache-key generation has no PyTorch read-limit bypass. A valid 12 MiB PyTorch ZIP incurred two full-hash attempts before returning a bounded scanner result with bytes_scanned=113.

Generated direct/HF-like/archive controls otherwise preserved malicious evidence and fail-closed behavior. Three current review threads match these findings. Exact-head CI is also red on the separately reported stale JSON/text assertion. These issues block merge.

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Takeover update for exact head b135be68d6bbb37eea4c3c75b9750f2841e60e40.

Additive commits pushed since launch head 3b8af102b7e9619b89639694a664a0007be9cc63:

  • f253a67b5 test: cover sparse PyTorch ZIP64 shard scanning
  • 301a42330 fix: ignore binary tensor metadata in JIT probes
  • b135be68 fix: bound PyTorch read-limit cache paths

Fixed the three fresh Codex P2 threads on f253a67b5:

  • non-seekable legacy PyTorch candidates now keep the generic read cap through preamble/capped layout validation, and only lift to the bounded stream budget after _legacy_pytorch_layout_for_scan() recognizes a real layout.
  • PyTorch read-limit hash deferral now mirrors BaseScanner invalid config defaults for None, strings, negative values, and booleans.
  • bounded PyTorch ZIP/legacy read-limit scans now bypass cache identity hashing in the cache decorator, aggregate dispatch, and scanner-level scan_with_cache().

Thread replies were posted and only those fixed threads were resolved. Live review threads now show no unresolved threads.

Validation completed with pytest capped at four workers:

  • new P2 focused regressions -> 9 passed
  • affected PyTorch/pickle/source/core/archive/CLI/hash bundle -> 3647 passed, 28 skipped
  • uv run --no-sync ruff format ... / uv run --no-sync ruff check --fix ... / final ruff check and ruff format --check -> clean
  • uv run --no-sync mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ -> success on 474 source files
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run --no-sync pytest -n 4 -m "not slow and not integration" --maxfail=1 -> 17399 passed, 1298 skipped
  • git diff --check -> clean

Pinned Hunyuan smoke on the final tree:

  • tencent/Hunyuan3D-2@9cd649ba6913f7a852e3286bad86bfa9a2d83dcf hunyuan3d-paint-v2-0-turbo/unet/diffusion_pytorch_model.bin with --stream --no-cache --max-size 8GB exited 2 in 26.1s.
  • JSON: success=false, has_errors=false, 0 issues, 0 errors, bytes_scanned=761538, archive size 7325432923, scan_outcome=inconclusive, scan_outcome_reasons=["pickle_analysis_incomplete"], pickle_files=["diffusion_pytorch_model/data.pkl"], and only file_hashes.sha256_prefix=ced3b00dc8360e4c9bb3b93dd9c1d66181114c14325b11292796fe1f167892f9.

Earlier full pinned Hunyuan direct-file matrix after the JIT fix had no jit_embedded_python_snippet_limit and no max_file_read_size_exceeded; the only remaining non-clean result was the same explicit 7.33 GB UNet pickle_analysis_incomplete budget outcome.

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Keep it up!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Current-head CI fix pushed at 03fc36b19.

Fixed the stale CLI regression that asserted human text in default/JSON output:

  • JSON-mode scan now explicitly uses --format json and asserts the structured inconclusive contract: success=false, has_errors=false, no issues/checks, and file_metadata[*].scan_outcome="inconclusive" with pickle_analysis_incomplete.
  • Text-mode scan now separately uses --format text and asserts the human Inconclusive label without reporting Clean.

Validation after the additive fix:

  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run --no-sync pytest -n 4 -m "not slow and not integration" --maxfail=1 tests/test_cli.py::test_scan_inconclusive_without_issues_reports_inconclusive -> 1 passed
  • uv run --no-sync ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ -> clean
  • uv run --no-sync ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ -> clean
  • uv run --no-sync mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ -> success on 474 source files
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run --no-sync pytest -n 4 -m "not slow and not integration" --maxfail=1 -> 17399 passed, 1298 skipped
  • git diff --check -> clean

No implementation behavior was changed in this commit; it only aligns the test with JSON-vs-text output contracts.

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. You're on a roll.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Exact-head closeout for 03fc36b19c74ec60ecd73a8eaadaaaf867d2837d:

  • Fresh Codex review completed with no major issues.
  • All review threads remain resolved.
  • Exact-head CI is green, including Python 3.10, Python 3.12 quick feedback, Python 3.13, Windows, Ruff/format, mypy/type check, package build, Docker, CodeQL, docs, and benchmarks.
  • mldangelo remains requested for human review; branch protection still reports REVIEW_REQUIRED / BLOCKED.

I did not enable squash auto-merge because the human-review gate is not clean yet.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9bdbd2a30e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/utils/helpers/cache_decorator.py Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2665946b7d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tests/integrations/test_sarif_formatter.py Outdated
# Conflicts:
#	modelaudit/scanners/pytorch_zip_scanner.py

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bf15f1d86c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/scanners/pickle_scanner.py
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Can't wait for the next one!

Reviewed commit: 5bcc43e1fc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@mldangelo-oai mldangelo-oai merged commit 93c0e5b into main Jun 12, 2026
29 checks passed
@mldangelo-oai mldangelo-oai deleted the mdangelo/codex/hf-fp-t24-large-pytorch-streaming-20260610 branch June 12, 2026 02:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant