fix(pytorch): distinguish producer metadata from runtime CVEs by mldangelo-oai · Pull Request #1643 · promptfoo/modelaudit

mldangelo-oai · 2026-06-11T01:14:27Z

Summary

treat checkpoint PyTorch producer/archive versions as provenance, not active runtime CVE evidence
keep runtime/package CVE reporting when the local scanner/loading runtime version is known
preserve artifact-structure findings, including malicious pickle and tensor metadata mismatch detections

Root Cause

The PyTorch ZIP scanner selected embedded checkpoint metadata, such as pickled producer PyTorch version strings, as the version input for runtime CVE gates when local PyTorch was absent. A benign pinned Hugging Face checkpoint produced by PyTorch 2.0.1 therefore emitted CRITICAL runtime CVEs even though the scanner had not established the consumer runtime version.

Security Tradeoff

Producer metadata remains recorded in scan metadata and in a new INFO provenance check. Runtime CVE checks now require local environment evidence. Artifact-level exploit evidence still fails independently, so malicious pickle payloads and structural CVE evidence are not suppressed.

Pinned Hugging Face QA

Model: pyannote/wespeaker-voxceleb-resnet34-LM @ 837717ddb9ff5507820346191109dc79c958d614

Baseline on origin/main 2f782ba:

PROMPTFOO_DISABLE_TELEMETRY=1 uv run modelaudit scan --format json --output /tmp/modelaudit-t23-baseline-direct.json --max-size 50MB --no-cache --scanners pytorch_zip https://huggingface.co/pyannote/wespeaker-voxceleb-resnet34-LM/resolve/837717ddb9ff5507820346191109dc79c958d614/pytorch_model.bin

Outcome: exit 1 with CRITICAL CVE-2025-32434, CVE-2026-24747, CVE-2024-5480, and CVE-2024-48063 from producer metadata source pickle:pytorch_model.310/data.pkl.

After this fix:

PROMPTFOO_DISABLE_TELEMETRY=1 uv run modelaudit scan --format json --output /tmp/modelaudit-t23-final-direct.json --max-size 50MB --no-cache --scanners pytorch_zip https://huggingface.co/pyannote/wespeaker-voxceleb-resnet34-LM/resolve/837717ddb9ff5507820346191109dc79c958d614/pytorch_model.bin

Outcome: exit 0 / Clean; critical CVE list []; producer metadata still reports pytorch_framework_version=2.0.1 and pytorch_version_source=pickle:pytorch_model.310/data.pkl; provenance check reports runtime_cve_version_gate=local_environment_only.

Validation

uv run ruff format modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ -> 419 files left unchanged
uv run ruff check --fix modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ -> All checks passed
uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ -> Success: no issues found in 474 source files
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -n auto -m "not slow and not integration" --maxfail=1 -> 17369 passed, 1292 skipped
PROMPTFOO_DISABLE_TELEMETRY=1 MODELAUDIT_RUN_HF_E2E=1 uv run pytest tests/scanners/test_pytorch_zip_scanner.py::test_pinned_huggingface_pyannote_checkpoint_does_not_emit_metadata_only_runtime_cves -q -> 1 passed
git diff --check -> clean

Controls

Vulnerable local runtime synthetic control still emits the four task runtime CVEs with pytorch_version_source=local_environment.
Fixed local runtime with old producer metadata stays clean for the task CVEs.
Producer JSON and pickled producer metadata controls no longer emit runtime CVEs when local runtime is unknown.
Malicious pickle with old producer metadata still emits a CRITICAL malicious-pickle finding.
Tensor metadata mismatch with old producer metadata still emits CVE-2026-24747 structural detection.

mldangelo-oai · 2026-06-11T01:15:01Z

@codex review

github-actions · 2026-06-11T01:16:38Z

Workflow run and artifacts

Performance Benchmarks

Compared 12 shared benchmarks with a regression threshold of 15%.
Status: 0 regressions, 0 improved, 12 stable, 0 new, 0 missing.
Aggregate shared-benchmark median: 1.443s -> 1.450s (+0.5%).

Workload	Benchmark	Target	Size	Files	Baseline	Current	Change	Status
`nested-payload-review`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_raw]`	`nested_raw`	78 B	1	531.0us	509.2us	-4.1%	stable
`warm-cache-rescan`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_warm_cached_repository_rescan`	`release-candidate`	547.4 KiB	32	98.43ms	102.44ms	+4.1%	stable
`direct-malicious-upload`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_direct_malicious_upload`	`malicious_reduce`	52 B	1	466.2us	449.2us	-3.7%	stable
`nested-payload-review`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_hex]`	`nested_hex`	130 B	1	541.6us	559.8us	+3.4%	stable
`single-checkpoint-preflight`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_single_checkpoint_before_load`	`single_checkpoint.pkl`	183.0 KiB	1	76.69ms	75.47ms	-1.6%	stable
`padded-multi-stream-upload`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_padded_multi_stream_upload`	`multi_stream_padded`	4.1 KiB	1	574.4us	565.6us	-1.5%	stable
`mixed-model-repository`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_release_candidate_repository`	`release-candidate`	547.4 KiB	32	479.73ms	486.62ms	+1.4%	stable
`nested-payload-review`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_base64]`	`nested_base64`	98 B	1	517.3us	511.6us	-1.1%	stable
`clean-training-checkpoint`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_clean_training_checkpoint`	`safe_large`	278.2 KiB	1	114.40ms	113.15ms	-1.1%	stable
`chunked-upload-stream`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_chunked_upload_stream`	`chunked_stream`	278.2 KiB	1	116.86ms	116.11ms	-0.6%	stable
`suspicious-pickle-intake`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_suspicious_pickle_intake`	`suspicious-intake`	183.8 KiB	4	143.10ms	143.95ms	+0.6%	stable
`duplicate-heavy-registry`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_duplicate_registry_snapshot`	`registry-snapshot`	915.2 KiB	13	410.74ms	409.66ms	-0.3%	stable

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4a34c3cd99

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T01:50:24Z

@codex review

chatgpt-codex-connector · 2026-06-11T01:53:06Z

Codex Review: Didn't find any major issues. Breezy!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T03:37:44Z

Pinned applicability QA input: both sentence-transformers/all-MiniLM-L12-v2@a50ef00143b4d5391434df20ae11632588ac25be and intfloat/multilingual-e5-base@d128750597153bb5987e10b1c3493a34e5a4502a report safetensors_available=false in PyTorch CVE checks even though model.safetensors is present in the same pinned inventory. Please make repository-level applicability context truthful without suppressing dangerous pickle evidence.

mldangelo-oai · 2026-06-11T04:36:34Z

Addressed the pinned SafeTensors applicability QA in 4878ad61b7fd782dd7fab5da4dc0907df2be3333.

Summary:

Carry trusted Hugging Face repository inventory into PyTorch ZIP scan config for direct, directory, and streaming acquisition paths.
Report safetensors_available=true when a same-directory model.safetensors sibling is present in repository metadata, without downloading that SafeTensors weight file.
Keep dangerous pickle/archive evidence independent: malicious pickle remains CRITICAL, embedded/deceptive .safetensors names do not count as alternatives, malformed/missing-alternative cases stay explicit.

Validation:

Focused new controls: 9 passed.
Exact pinned metadata-only QA for sentence-transformers/all-MiniLM-L12-v2@a50ef00143b4d5391434df20ae11632588ac25be and intfloat/multilingual-e5-base@d128750597153bb5987e10b1c3493a34e5a4502a: 2 passed; confirmed model.safetensors in both pinned inventories without downloading weight blobs.
Existing PyTorch runtime/producer CVE controls: 7 passed.
Broad affected suites (test_pytorch_zip_scanner.py, Hugging Face source tests, streaming scan tests): 1199 passed, 4 skipped.
ruff format, ruff check, mypy, and git diff --check: clean.
Full pytest -n auto -m "not slow and not integration" --maxfail=1 was attempted twice; both xdist runs stopped on unrelated cache release-call assertions, and each failing cache node passed when rerun directly.

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4878ad61b7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T04:50:11Z

@codex address these exact-head blockers before merge:

High: untrusted repository torch-*.dist-info can spoof importlib.metadata.version("torch"). With a real/simulated vulnerable torch==2.5.1, adding torch-2.10.0.dist-info/METADATA under the current untrusted model directory changed the same scan from CVE findings/exit 1 to no CVEs/exit 0. Resolve package metadata only from trusted installation roots or explicit caller input, and record the validated path.
Medium: unknown consumer runtime is represented as passed/clean and silently omits version-gated checks. Emit explicit machine-readable unknown/inconclusive applicability.
Medium: direct HF file scanning now depends on unbounded, untimed repo_info(); failure blocks an otherwise downloadable pinned file, and mutable refs can race inventory vs download. Make it bounded, deadline-aware, same-SHA, and best effort.
Medium: repository inventory is normalized/copied once per embedded pickle, producing O(files x pickle-members) work. Bound and index it once per scan.
Low: any same-directory .safetensors filename is claimed as a usable alternative without validating framing, relation, size, or exact path.

Independent report: /Users/mdangelo/modelaudit-pr-swarm/outputs/pr-1643-independent-review-20260611.md.

mldangelo-oai · 2026-06-11T06:18:21Z

Addressed the fresh exact-head blockers in 0b32f548e0795a1e20ba0eeb89d7837aa189ef9a.

Summary:

Resolve torch runtime metadata only from trusted Python install roots and record the validated metadata path; repository-local torch-*.dist-info no longer spoofs runtime CVE gates.
Emit explicit skipped, machine-readable unknown runtime applicability for PyTorch version-gated CVEs instead of silently omitting those checks.
Bound and index repository inventory once per scan, require related SafeTensors alternative names, and validate local SafeTensors framing before claiming availability.
Make direct HF inventory best-effort unless size enforcement needs metadata; consume inventory only when tied to an immutable revision and pin the download to that revision.
Pass cache-safe HF streaming roots while preserving repository-member context for nested inventory siblings.

Validation:

Focused regression nodes: 11 passed, then post-fix focused nodes: 5 passed.
Broad affected suites: tests/scanners/test_pytorch_zip_scanner.py tests/utils/sources/test_huggingface.py tests/test_streaming_scan.py -> 1206 passed, 4 skipped.
Full HF source file plus CLI streaming regressions after final tightening: 274 passed, 1 skipped.
Exact pinned HF metadata-only QA with telemetry disabled: 2 passed for sentence-transformers/all-MiniLM-L12-v2@a50ef00143b4d5391434df20ae11632588ac25be and intfloat/multilingual-e5-base@d128750597153bb5987e10b1c3493a34e5a4502a.
uv run ruff format ... stable; uv run ruff check ... passed.
uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ passed.
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -n auto -m "not slow and not integration" --maxfail=1 -> 17391 passed, 1292 skipped.
git diff --check and git diff --cached --check passed before commit.

@codex review

mldangelo-oai · 2026-06-11T06:28:36Z

Exact-head follow-up pushed on 0250eb3626330e8ff702b2dde0fd2d2d5b080470.

Additional repair after CI: CodeQL flagged clear-text logging in the trusted torch metadata lookup fallback. I removed exception/path interpolation from that debug path without changing CVE applicability behavior.

Validation after this last patch:

PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_pytorch_zip_scanner.py::test_pytorch_zip_ignores_untrusted_repository_torch_dist_info tests/scanners/test_pytorch_zip_scanner.py::test_pytorch_zip_unknown_runtime_emits_skipped_cve_applicability_checks -q -> 2 passed
uv run ruff format modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ -> 420 files left unchanged
uv run ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ -> all checks passed
uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ -> success, 475 files
git diff --check -> clean

@codex review

chatgpt-codex-connector · 2026-06-11T06:31:38Z

Codex Review: Didn't find any major issues. Delightful!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Mark runtime-version-gated PyTorch CVE applicability inconclusive when producer metadata is present but no trusted local runtime version is available. Preserve producer-only false-positive suppression by keeping the CVE checks skipped instead of critical.

mldangelo-oai · 2026-06-11T09:50:02Z

@codex review\n\nExact head: be21a86

chatgpt-codex-connector · 2026-06-11T09:53:18Z

Codex Review: Didn't find any major issues. Already looking forward to the next diff.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T09:53:35Z

Exact-head CI blocker for be21a8645631...: Performance Benchmarks run 27338512745, job 80768912470 has two deterministic functional failures, not a timing regression.

test_scan_release_candidate_repository
test_scan_warm_cached_repository_rescan

Both expect result.success is True, but the scan returns false with File extension indicates pickle but header indicate... in the result. Reproduce the benchmark fixture locally and determine why the current producer/runtime-CVE applicability path changes its aggregate outcome. Fix the branch or fixture contract at the root cause; do not simply relax success assertions or suppress a real format mismatch. Rerun both exact benchmark nodes before the full benchmark workflow.

Avoid embedding a PyTorch-looking producer version string in the clean benchmark ZIP payload so the performance lane continues to benchmark a successful release-candidate repository under unknown local runtime conditions.

mldangelo-oai · 2026-06-11T10:07:50Z

@codex review\n\nExact head: 1e88eeb

chatgpt-codex-connector · 2026-06-11T10:10:44Z

Codex Review: Didn't find any major issues. Chef's kiss.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T10:52:28Z

@codex review

Exact head: 1367166

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 136716630c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T11:42:09Z

@codex review

mldangelo-oai · 2026-06-11T11:44:12Z

@codex review

Exact head: 3decfed

chatgpt-codex-connector · 2026-06-11T11:48:12Z

Codex Review: Didn't find any major issues. Bravo.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T12:14:20Z

Independent review of exact head 3decfed4ab64dfb8e15796ddac9a4e465099f52f found four issues:

P1: generic pickle semvers control whether unknown runtime is clean or operationally inconclusive. Any 1.x.y/2.x.y string is treated as producer metadata. With no local Torch, unrelated {"version":"2.6.0"} changes a safe real torch.save() artifact from exit 0 to exit 2; replacing it with benchmark-release is clean, and base is clean. The benchmark fixture masks this by removing semver rather than fixing attribution.
P1: runtime provenance can select a different Torch installation than Python would load. Scanner root ordering chose system Torch 2.10.0 while Python import precedence selected vulnerable user-site 2.5.1, suppressing runtime CVEs. A fake preloaded Torch 2.10.0 from /untrusted-repo/torch.py likewise suppressed CVE-2025-32434 and CVE-2026-24747 from trusted 2.5.1 metadata. Resolve import provenance without importing and bind distribution metadata to the resolved origin.
P2: repository inventory is re-indexed for every PyTorch file. Three scanner constructions over 100,000 entries took 3.687 seconds and built three distinct indexes; normalize once per repository scan and pass the inventory object onward.
P3: inventory filenames bypass SafeTensors validity checks. A zero-byte model.safetensors listed in inventory is advertised as an available alternative; without inventory it is rejected. A local nine-byte length-plus-{ file also passes the weak framing check. This misleads applicability/remediation output but does not suppress pickle evidence.

Validation: 40 focused PyTorch ZIP controls, 272 HF source tests, and 7 CLI streaming controls passed. Release-candidate cold/warm scans pass only after removing generic semver. Real Torch 2.11 happy/malicious paths behaved as expected. All six threads are resolved; static/benchmark/Docker checks are green, with four Python/Windows lanes still running. The P1/P2 issues block merge.

fix(pytorch): distinguish producer metadata from runtime CVEs

4a34c3c

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread modelaudit/scanners/pytorch_zip_scanner.py

fix: detect installed torch runtime from package metadata

81a2037

mldangelo-oai requested a review from mldangelo June 11, 2026 01:56

mldangelo-oai enabled auto-merge (squash) June 11, 2026 01:56

mldangelo-oai disabled auto-merge June 11, 2026 03:39

fix: carry HF safetensors inventory into PyTorch CVE checks

4878ad6

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread modelaudit/cli.py

Comment thread modelaudit/utils/sources/huggingface.py Outdated

Comment thread modelaudit/utils/sources/huggingface.py Outdated

fix: harden pytorch cve applicability context

0b32f54

github-advanced-security AI found potential problems Jun 11, 2026

View reviewed changes

Comment thread modelaudit/scanners/pytorch_zip_scanner.py Fixed

fix: redact trusted package lookup failures

0250eb3

test(benchmarks): keep release candidate workload clean

1e88eeb

Avoid embedding a PyTorch-looking producer version string in the clean benchmark ZIP payload so the performance lane continues to benchmark a successful release-candidate repository under unknown local runtime conditions.

test(pytorch): make unknown runtime regression deterministic

1367166

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread modelaudit/cli.py Outdated

fix(hf): preserve no-cache streaming scan root

3decfed

mldangelo-oai enabled auto-merge (squash) June 11, 2026 12:16

mldangelo-oai disabled auto-merge June 11, 2026 12:21

Conversation

mldangelo-oai commented Jun 11, 2026

Summary

Root Cause

Security Tradeoff

Pinned Hugging Face QA

Validation

Controls

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance Benchmarks

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented Jun 11, 2026 •

edited

Loading