Skip to content

fix: calibrate legacy PyTorch storage persistent IDs#1652

Merged
mldangelo-oai merged 1 commit into
mainfrom
mdangelo/codex/hf-fp-t20-legacy-pytorch-persistent-ids-20260610
Jun 11, 2026
Merged

fix: calibrate legacy PyTorch storage persistent IDs#1652
mldangelo-oai merged 1 commit into
mainfrom
mdangelo/codex/hf-fp-t20-legacy-pytorch-persistent-ids-20260610

Conversation

@mldangelo-oai

Copy link
Copy Markdown
Contributor

Summary

  • Downgrade canonical PyTorch storage BINPERSID findings to informational only after ModelAudit has validated legacy PyTorch framing, the canonical storage persistent-ID tuple, and the referenced storage payload.
  • Keep arbitrary/custom PERSID and non-storage BINPERSID records fail-closed by marking the legacy storage layout incomplete instead of applying contextual trust.
  • Add focused regression coverage for trusted canonical storage IDs and malformed/malicious controls, plus a changelog entry.

Root cause and security tradeoff

Legacy PyTorch binary files store tensor payload references as pickle persistent IDs. The standalone pickle engine correctly reports PERSISTENT_ID/BINPERSID as S212 because, outside context, persistent loaders can be attacker-controlled. ModelAudit already validates the broader legacy PyTorch container and storage payload layout, but it did not use that validated context to calibrate the canonical storage BINPERSID warning. That left canonical torch.LongStorage references in a fully validated legacy PyTorch binary as actionable false positives.

This PR applies trust only in the Python ModelAudit scanner, after all of these hold:

  • legacy PyTorch control streams and storage records validate
  • the persistent ID is canonical storage BINPERSID
  • the storage key is present in the validated storage-key list
  • the referenced storage payload is present and sized correctly

The standalone modelaudit-picklescan package remains conservative. Ambiguous evidence still fails closed.

Real-model QA

Pinned model:

  • ProsusAI/finbert @ 4556d13015211d73dccd3fdd39d39232506f3e43
  • Verified current hf://ProsusAI/finbert SHA: 4556d13015211d73dccd3fdd39d39232506f3e43

Before the fix on the required baseline, the pinned pytorch_model.bin reported actionable S212 for canonical torch.LongStorage BINPERSID key 94591118788240.

Final local pinned artifact:

PROMPTFOO_DISABLE_TELEMETRY=1 uv run modelaudit scan /tmp/modelaudit-t20-pytorch_model.bin --format json --output /tmp/modelaudit-t20-final-local-bin.json --max-size 1GB --timeout 900 --no-whitelist --no-cache

Outcome: exit 0, success=true, issues=0, failed_checks=0, pickle_verdict=clean, pickle_report_status=complete, legacy_pytorch_storage_key_count=202, legacy_pytorch_trusted_storage_persistent_id_count=1. The S212 BINPERSID for key 94591118788240 is present as passed/info with trusted_legacy_pytorch_context=true.

Final pinned HF file:

PROMPTFOO_DISABLE_TELEMETRY=1 uv run modelaudit scan https://huggingface.co/ProsusAI/finbert/resolve/4556d13015211d73dccd3fdd39d39232506f3e43/pytorch_model.bin --format json --output /tmp/modelaudit-t20-final-hf-file.json --cache-dir /tmp/modelaudit-t20-final-hf-file-cache --max-size 1GB --timeout 900 --no-whitelist

Outcome: exit 0, success=true, issues=0, failed_checks=0, pickle_verdict=clean, pickle_report_status=complete, legacy_pytorch_storage_key_count=202, legacy_pytorch_trusted_storage_persistent_id_count=1.

Final HF streaming scan:

PROMPTFOO_DISABLE_TELEMETRY=1 uv run modelaudit scan hf://ProsusAI/finbert --stream --format json --output /tmp/modelaudit-t20-final-hf-stream.json --cache-dir /tmp/modelaudit-t20-final-hf-stream-cache --max-size 2GB --timeout 900 --no-whitelist

Outcome: exit 0, success=true, has_errors=false, files_scanned=7, bytes_scanned=876193055, failed_checks=0; the PyTorch file has pickle_verdict=clean, pickle_report_status=complete, legacy_pytorch_storage_key_count=202, legacy_pytorch_trusted_storage_persistent_id_count=1. The remaining 7 issues are informational README S309 URL/domain findings.

Tests

PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_pickle_scanner.py -k "legacy_pytorch" --maxfail=1 -q
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_pickle_scanner.py -q
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest packages/modelaudit-picklescan/tests/test_api.py -k "persistent_id or pytorch_storage" -q
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_picklescan_adapter.py tests/scanners/test_pytorch_zip_scanner.py -k "persistent_id or PERSISTENT_ID or storage_persistent" -q
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/test_core.py -k "persistent or pytorch or pickle" -q
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/test_streaming_scan.py -q
uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
uv run ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -n auto --maxschedchunk=1 -m "not slow and not integration" --maxfail=1
git diff --check

Broad suite outcome: 18549 passed, 793 skipped, 40 warnings in 635.72s.

Malicious and malformed controls

Regression tests cover:

  • .bin extension alone does not grant trust
  • custom PERSID
  • extra non-storage BINPERSID
  • noncanonical storage tuple shapes and types
  • missing storage-key records
  • unlisted storage keys
  • truncated storage payloads
  • malformed legacy storage headers
  • truncated control streams

All ambiguous cases retain warning or inconclusive behavior and do not set trusted_legacy_pytorch_context.

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex review

@github-actions

Copy link
Copy Markdown
Contributor

Workflow run and artifacts

Performance Benchmarks

Compared 12 shared benchmarks with a regression threshold of 15%.
Status: 0 regressions, 0 improved, 12 stable, 0 new, 0 missing.
Aggregate shared-benchmark median: 1.422s -> 1.413s (-0.6%).

Workload Benchmark Target Size Files Baseline Current Change Status
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_base64] nested_base64 98 B 1 586.5us 598.2us +2.0% stable
padded-multi-stream-upload tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_padded_multi_stream_upload multi_stream_padded 4.1 KiB 1 653.4us 640.7us -1.9% stable
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_hex] nested_hex 130 B 1 622.7us 611.3us -1.8% stable
direct-malicious-upload tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_direct_malicious_upload malicious_reduce 52 B 1 541.0us 533.1us -1.5% stable
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_raw] nested_raw 78 B 1 585.6us 578.2us -1.3% stable
mixed-model-repository tests/benchmarks/test_scan_benchmarks.py::test_scan_release_candidate_repository release-candidate 547.3 KiB 32 484.72ms 479.40ms -1.1% stable
single-checkpoint-preflight tests/benchmarks/test_scan_benchmarks.py::test_scan_single_checkpoint_before_load single_checkpoint.pkl 183.0 KiB 1 71.59ms 70.82ms -1.1% stable
chunked-upload-stream tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_chunked_upload_stream chunked_stream 278.2 KiB 1 112.92ms 112.37ms -0.5% stable
duplicate-heavy-registry tests/benchmarks/test_scan_benchmarks.py::test_scan_duplicate_registry_snapshot registry-snapshot 915.2 KiB 13 390.26ms 388.70ms -0.4% stable
warm-cache-rescan tests/benchmarks/test_scan_benchmarks.py::test_scan_warm_cached_repository_rescan release-candidate 547.3 KiB 32 107.09ms 106.67ms -0.4% stable
clean-training-checkpoint tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_clean_training_checkpoint safe_large 278.2 KiB 1 109.05ms 109.46ms +0.4% stable
suspicious-pickle-intake tests/benchmarks/test_scan_benchmarks.py::test_scan_suspicious_pickle_intake suspicious-intake 183.8 KiB 4 142.89ms 142.66ms -0.2% stable

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 🎉

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@mldangelo-oai mldangelo-oai requested a review from mldangelo June 11, 2026 02:00
@mldangelo-oai mldangelo-oai enabled auto-merge (squash) June 11, 2026 02:00

@ianw-oai ianw-oai left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved as a focused legacy PyTorch storage persistent-ID calibration with fail-closed regressions and green checks.

@mldangelo-oai mldangelo-oai merged commit 4a28b7f into main Jun 11, 2026
29 checks passed
@mldangelo-oai mldangelo-oai deleted the mdangelo/codex/hf-fp-t20-legacy-pytorch-persistent-ids-20260610 branch June 11, 2026 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants