fix: calibrate legacy PyTorch storage persistent IDs#1652
Merged
mldangelo-oai merged 1 commit intoJun 11, 2026
Merged
Conversation
Contributor
Author
|
@codex review |
Contributor
Performance BenchmarksCompared
|
|
Codex Review: Didn't find any major issues. 🎉 ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
ianw-oai
approved these changes
Jun 11, 2026
ianw-oai
approved these changes
Jun 11, 2026
ianw-oai
left a comment
Contributor
There was a problem hiding this comment.
Approved as a focused legacy PyTorch storage persistent-ID calibration with fail-closed regressions and green checks.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
BINPERSIDfindings to informational only after ModelAudit has validated legacy PyTorch framing, the canonical storage persistent-ID tuple, and the referenced storage payload.PERSIDand non-storageBINPERSIDrecords fail-closed by marking the legacy storage layout incomplete instead of applying contextual trust.Root cause and security tradeoff
Legacy PyTorch binary files store tensor payload references as pickle persistent IDs. The standalone pickle engine correctly reports
PERSISTENT_ID/BINPERSIDas S212 because, outside context, persistent loaders can be attacker-controlled. ModelAudit already validates the broader legacy PyTorch container and storage payload layout, but it did not use that validated context to calibrate the canonical storageBINPERSIDwarning. That left canonicaltorch.LongStoragereferences in a fully validated legacy PyTorch binary as actionable false positives.This PR applies trust only in the Python ModelAudit scanner, after all of these hold:
BINPERSIDThe standalone
modelaudit-picklescanpackage remains conservative. Ambiguous evidence still fails closed.Real-model QA
Pinned model:
ProsusAI/finbert @ 4556d13015211d73dccd3fdd39d39232506f3e43hf://ProsusAI/finbertSHA:4556d13015211d73dccd3fdd39d39232506f3e43Before the fix on the required baseline, the pinned
pytorch_model.binreported actionable S212 for canonicaltorch.LongStorageBINPERSIDkey94591118788240.Final local pinned artifact:
Outcome: exit 0,
success=true,issues=0,failed_checks=0,pickle_verdict=clean,pickle_report_status=complete,legacy_pytorch_storage_key_count=202,legacy_pytorch_trusted_storage_persistent_id_count=1. The S212BINPERSIDfor key94591118788240is present aspassed/infowithtrusted_legacy_pytorch_context=true.Final pinned HF file:
Outcome: exit 0,
success=true,issues=0,failed_checks=0,pickle_verdict=clean,pickle_report_status=complete,legacy_pytorch_storage_key_count=202,legacy_pytorch_trusted_storage_persistent_id_count=1.Final HF streaming scan:
Outcome: exit 0,
success=true,has_errors=false,files_scanned=7,bytes_scanned=876193055,failed_checks=0; the PyTorch file haspickle_verdict=clean,pickle_report_status=complete,legacy_pytorch_storage_key_count=202,legacy_pytorch_trusted_storage_persistent_id_count=1. The remaining 7 issues are informational README S309 URL/domain findings.Tests
Broad suite outcome:
18549 passed, 793 skipped, 40 warnings in 635.72s.Malicious and malformed controls
Regression tests cover:
.binextension alone does not grant trustPERSIDBINPERSIDAll ambiguous cases retain warning or inconclusive behavior and do not set
trusted_legacy_pytorch_context.