Skip to content

fix(pytorch): distinguish producer metadata from runtime CVEs#1643

Open
mldangelo-oai wants to merge 9 commits into
mainfrom
mdangelo/codex/hf-fp-t23-pytorch-cve-applicability-20260610
Open

fix(pytorch): distinguish producer metadata from runtime CVEs#1643
mldangelo-oai wants to merge 9 commits into
mainfrom
mdangelo/codex/hf-fp-t23-pytorch-cve-applicability-20260610

Conversation

@mldangelo-oai

Copy link
Copy Markdown
Contributor

Summary

  • treat checkpoint PyTorch producer/archive versions as provenance, not active runtime CVE evidence
  • keep runtime/package CVE reporting when the local scanner/loading runtime version is known
  • preserve artifact-structure findings, including malicious pickle and tensor metadata mismatch detections

Root Cause

The PyTorch ZIP scanner selected embedded checkpoint metadata, such as pickled producer PyTorch version strings, as the version input for runtime CVE gates when local PyTorch was absent. A benign pinned Hugging Face checkpoint produced by PyTorch 2.0.1 therefore emitted CRITICAL runtime CVEs even though the scanner had not established the consumer runtime version.

Security Tradeoff

Producer metadata remains recorded in scan metadata and in a new INFO provenance check. Runtime CVE checks now require local environment evidence. Artifact-level exploit evidence still fails independently, so malicious pickle payloads and structural CVE evidence are not suppressed.

Pinned Hugging Face QA

Model: pyannote/wespeaker-voxceleb-resnet34-LM @ 837717ddb9ff5507820346191109dc79c958d614

Baseline on origin/main 2f782ba:

PROMPTFOO_DISABLE_TELEMETRY=1 uv run modelaudit scan --format json --output /tmp/modelaudit-t23-baseline-direct.json --max-size 50MB --no-cache --scanners pytorch_zip https://huggingface.co/pyannote/wespeaker-voxceleb-resnet34-LM/resolve/837717ddb9ff5507820346191109dc79c958d614/pytorch_model.bin

Outcome: exit 1 with CRITICAL CVE-2025-32434, CVE-2026-24747, CVE-2024-5480, and CVE-2024-48063 from producer metadata source pickle:pytorch_model.310/data.pkl.

After this fix:

PROMPTFOO_DISABLE_TELEMETRY=1 uv run modelaudit scan --format json --output /tmp/modelaudit-t23-final-direct.json --max-size 50MB --no-cache --scanners pytorch_zip https://huggingface.co/pyannote/wespeaker-voxceleb-resnet34-LM/resolve/837717ddb9ff5507820346191109dc79c958d614/pytorch_model.bin

Outcome: exit 0 / Clean; critical CVE list []; producer metadata still reports pytorch_framework_version=2.0.1 and pytorch_version_source=pickle:pytorch_model.310/data.pkl; provenance check reports runtime_cve_version_gate=local_environment_only.

Validation

  • uv run ruff format modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ -> 419 files left unchanged
  • uv run ruff check --fix modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ -> All checks passed
  • uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ -> Success: no issues found in 474 source files
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -n auto -m "not slow and not integration" --maxfail=1 -> 17369 passed, 1292 skipped
  • PROMPTFOO_DISABLE_TELEMETRY=1 MODELAUDIT_RUN_HF_E2E=1 uv run pytest tests/scanners/test_pytorch_zip_scanner.py::test_pinned_huggingface_pyannote_checkpoint_does_not_emit_metadata_only_runtime_cves -q -> 1 passed
  • git diff --check -> clean

Controls

  • Vulnerable local runtime synthetic control still emits the four task runtime CVEs with pytorch_version_source=local_environment.
  • Fixed local runtime with old producer metadata stays clean for the task CVEs.
  • Producer JSON and pickled producer metadata controls no longer emit runtime CVEs when local runtime is unknown.
  • Malicious pickle with old producer metadata still emits a CRITICAL malicious-pickle finding.
  • Tensor metadata mismatch with old producer metadata still emits CVE-2026-24747 structural detection.

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex review

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Workflow run and artifacts

Performance Benchmarks

Compared 12 shared benchmarks with a regression threshold of 15%.
Status: 0 regressions, 0 improved, 12 stable, 0 new, 0 missing.
Aggregate shared-benchmark median: 1.443s -> 1.450s (+0.5%).

Workload Benchmark Target Size Files Baseline Current Change Status
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_raw] nested_raw 78 B 1 531.0us 509.2us -4.1% stable
warm-cache-rescan tests/benchmarks/test_scan_benchmarks.py::test_scan_warm_cached_repository_rescan release-candidate 547.4 KiB 32 98.43ms 102.44ms +4.1% stable
direct-malicious-upload tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_direct_malicious_upload malicious_reduce 52 B 1 466.2us 449.2us -3.7% stable
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_hex] nested_hex 130 B 1 541.6us 559.8us +3.4% stable
single-checkpoint-preflight tests/benchmarks/test_scan_benchmarks.py::test_scan_single_checkpoint_before_load single_checkpoint.pkl 183.0 KiB 1 76.69ms 75.47ms -1.6% stable
padded-multi-stream-upload tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_padded_multi_stream_upload multi_stream_padded 4.1 KiB 1 574.4us 565.6us -1.5% stable
mixed-model-repository tests/benchmarks/test_scan_benchmarks.py::test_scan_release_candidate_repository release-candidate 547.4 KiB 32 479.73ms 486.62ms +1.4% stable
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_base64] nested_base64 98 B 1 517.3us 511.6us -1.1% stable
clean-training-checkpoint tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_clean_training_checkpoint safe_large 278.2 KiB 1 114.40ms 113.15ms -1.1% stable
chunked-upload-stream tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_chunked_upload_stream chunked_stream 278.2 KiB 1 116.86ms 116.11ms -0.6% stable
suspicious-pickle-intake tests/benchmarks/test_scan_benchmarks.py::test_scan_suspicious_pickle_intake suspicious-intake 183.8 KiB 4 143.10ms 143.95ms +0.6% stable
duplicate-heavy-registry tests/benchmarks/test_scan_benchmarks.py::test_scan_duplicate_registry_snapshot registry-snapshot 915.2 KiB 13 410.74ms 409.66ms -0.3% stable

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4a34c3cd99

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/scanners/pytorch_zip_scanner.py
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Breezy!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@mldangelo-oai mldangelo-oai requested a review from mldangelo June 11, 2026 01:56
@mldangelo-oai mldangelo-oai enabled auto-merge (squash) June 11, 2026 01:56
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Pinned applicability QA input: both sentence-transformers/all-MiniLM-L12-v2@a50ef00143b4d5391434df20ae11632588ac25be and intfloat/multilingual-e5-base@d128750597153bb5987e10b1c3493a34e5a4502a report safetensors_available=false in PyTorch CVE checks even though model.safetensors is present in the same pinned inventory. Please make repository-level applicability context truthful without suppressing dangerous pickle evidence.

@mldangelo-oai mldangelo-oai disabled auto-merge June 11, 2026 03:39

Copy link
Copy Markdown
Contributor Author

Addressed the pinned SafeTensors applicability QA in 4878ad61b7fd782dd7fab5da4dc0907df2be3333.

Summary:

  • Carry trusted Hugging Face repository inventory into PyTorch ZIP scan config for direct, directory, and streaming acquisition paths.
  • Report safetensors_available=true when a same-directory model.safetensors sibling is present in repository metadata, without downloading that SafeTensors weight file.
  • Keep dangerous pickle/archive evidence independent: malicious pickle remains CRITICAL, embedded/deceptive .safetensors names do not count as alternatives, malformed/missing-alternative cases stay explicit.

Validation:

  • Focused new controls: 9 passed.
  • Exact pinned metadata-only QA for sentence-transformers/all-MiniLM-L12-v2@a50ef00143b4d5391434df20ae11632588ac25be and intfloat/multilingual-e5-base@d128750597153bb5987e10b1c3493a34e5a4502a: 2 passed; confirmed model.safetensors in both pinned inventories without downloading weight blobs.
  • Existing PyTorch runtime/producer CVE controls: 7 passed.
  • Broad affected suites (test_pytorch_zip_scanner.py, Hugging Face source tests, streaming scan tests): 1199 passed, 4 skipped.
  • ruff format, ruff check, mypy, and git diff --check: clean.
  • Full pytest -n auto -m "not slow and not integration" --maxfail=1 was attempted twice; both xdist runs stopped on unrelated cache release-call assertions, and each failing cache node passed when rerun directly.

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4878ad61b7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/cli.py
Comment thread modelaudit/utils/sources/huggingface.py Outdated
Comment thread modelaudit/utils/sources/huggingface.py Outdated
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex address these exact-head blockers before merge:

  • High: untrusted repository torch-*.dist-info can spoof importlib.metadata.version("torch"). With a real/simulated vulnerable torch==2.5.1, adding torch-2.10.0.dist-info/METADATA under the current untrusted model directory changed the same scan from CVE findings/exit 1 to no CVEs/exit 0. Resolve package metadata only from trusted installation roots or explicit caller input, and record the validated path.
  • Medium: unknown consumer runtime is represented as passed/clean and silently omits version-gated checks. Emit explicit machine-readable unknown/inconclusive applicability.
  • Medium: direct HF file scanning now depends on unbounded, untimed repo_info(); failure blocks an otherwise downloadable pinned file, and mutable refs can race inventory vs download. Make it bounded, deadline-aware, same-SHA, and best effort.
  • Medium: repository inventory is normalized/copied once per embedded pickle, producing O(files x pickle-members) work. Bound and index it once per scan.
  • Low: any same-directory .safetensors filename is claimed as a usable alternative without validating framing, relation, size, or exact path.

Independent report: /Users/mdangelo/modelaudit-pr-swarm/outputs/pr-1643-independent-review-20260611.md.

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Addressed the fresh exact-head blockers in 0b32f548e0795a1e20ba0eeb89d7837aa189ef9a.

Summary:

  • Resolve torch runtime metadata only from trusted Python install roots and record the validated metadata path; repository-local torch-*.dist-info no longer spoofs runtime CVE gates.
  • Emit explicit skipped, machine-readable unknown runtime applicability for PyTorch version-gated CVEs instead of silently omitting those checks.
  • Bound and index repository inventory once per scan, require related SafeTensors alternative names, and validate local SafeTensors framing before claiming availability.
  • Make direct HF inventory best-effort unless size enforcement needs metadata; consume inventory only when tied to an immutable revision and pin the download to that revision.
  • Pass cache-safe HF streaming roots while preserving repository-member context for nested inventory siblings.

Validation:

  • Focused regression nodes: 11 passed, then post-fix focused nodes: 5 passed.
  • Broad affected suites: tests/scanners/test_pytorch_zip_scanner.py tests/utils/sources/test_huggingface.py tests/test_streaming_scan.py -> 1206 passed, 4 skipped.
  • Full HF source file plus CLI streaming regressions after final tightening: 274 passed, 1 skipped.
  • Exact pinned HF metadata-only QA with telemetry disabled: 2 passed for sentence-transformers/all-MiniLM-L12-v2@a50ef00143b4d5391434df20ae11632588ac25be and intfloat/multilingual-e5-base@d128750597153bb5987e10b1c3493a34e5a4502a.
  • uv run ruff format ... stable; uv run ruff check ... passed.
  • uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ passed.
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -n auto -m "not slow and not integration" --maxfail=1 -> 17391 passed, 1292 skipped.
  • git diff --check and git diff --cached --check passed before commit.

@codex review

Comment thread modelaudit/scanners/pytorch_zip_scanner.py Fixed
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Exact-head follow-up pushed on 0250eb3626330e8ff702b2dde0fd2d2d5b080470.

Additional repair after CI: CodeQL flagged clear-text logging in the trusted torch metadata lookup fallback. I removed exception/path interpolation from that debug path without changing CVE applicability behavior.

Validation after this last patch:

  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_pytorch_zip_scanner.py::test_pytorch_zip_ignores_untrusted_repository_torch_dist_info tests/scanners/test_pytorch_zip_scanner.py::test_pytorch_zip_unknown_runtime_emits_skipped_cve_applicability_checks -q -> 2 passed
  • uv run ruff format modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ -> 420 files left unchanged
  • uv run ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ -> all checks passed
  • uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ -> success, 475 files
  • git diff --check -> clean

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Delightful!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Mark runtime-version-gated PyTorch CVE applicability inconclusive when producer metadata is present but no trusted local runtime version is available. Preserve producer-only false-positive suppression by keeping the CVE checks skipped instead of critical.
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex review\n\nExact head: be21a86

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Already looking forward to the next diff.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Exact-head CI blocker for be21a8645631...: Performance Benchmarks run 27338512745, job 80768912470 has two deterministic functional failures, not a timing regression.

  • test_scan_release_candidate_repository
  • test_scan_warm_cached_repository_rescan

Both expect result.success is True, but the scan returns false with File extension indicates pickle but header indicate... in the result. Reproduce the benchmark fixture locally and determine why the current producer/runtime-CVE applicability path changes its aggregate outcome. Fix the branch or fixture contract at the root cause; do not simply relax success assertions or suppress a real format mismatch. Rerun both exact benchmark nodes before the full benchmark workflow.

Avoid embedding a PyTorch-looking producer version string in the clean benchmark ZIP payload so the performance lane continues to benchmark a successful release-candidate repository under unknown local runtime conditions.
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex review\n\nExact head: 1e88eeb

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Chef's kiss.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex review

Exact head: 1367166

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 136716630c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/cli.py Outdated
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex review

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex review

Exact head: 3decfed

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Bravo.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Independent review of exact head 3decfed4ab64dfb8e15796ddac9a4e465099f52f found four issues:

  1. P1: generic pickle semvers control whether unknown runtime is clean or operationally inconclusive. Any 1.x.y/2.x.y string is treated as producer metadata. With no local Torch, unrelated {"version":"2.6.0"} changes a safe real torch.save() artifact from exit 0 to exit 2; replacing it with benchmark-release is clean, and base is clean. The benchmark fixture masks this by removing semver rather than fixing attribution.
  2. P1: runtime provenance can select a different Torch installation than Python would load. Scanner root ordering chose system Torch 2.10.0 while Python import precedence selected vulnerable user-site 2.5.1, suppressing runtime CVEs. A fake preloaded Torch 2.10.0 from /untrusted-repo/torch.py likewise suppressed CVE-2025-32434 and CVE-2026-24747 from trusted 2.5.1 metadata. Resolve import provenance without importing and bind distribution metadata to the resolved origin.
  3. P2: repository inventory is re-indexed for every PyTorch file. Three scanner constructions over 100,000 entries took 3.687 seconds and built three distinct indexes; normalize once per repository scan and pass the inventory object onward.
  4. P3: inventory filenames bypass SafeTensors validity checks. A zero-byte model.safetensors listed in inventory is advertised as an available alternative; without inventory it is rejected. A local nine-byte length-plus-{ file also passes the weak framing check. This misleads applicability/remediation output but does not suppress pickle evidence.

Validation: 40 focused PyTorch ZIP controls, 272 HF source tests, and 7 CLI streaming controls passed. Release-candidate cold/warm scans pass only after removing generic semver. Real Torch 2.11 happy/malicious paths behaved as expected. All six threads are resolved; static/benchmark/Docker checks are green, with four Python/Windows lanes still running. The P1/P2 issues block merge.

@mldangelo-oai mldangelo-oai enabled auto-merge (squash) June 11, 2026 12:16
@mldangelo-oai mldangelo-oai disabled auto-merge June 11, 2026 12:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants