fix: fail gated Hugging Face acquisitions honestly by mldangelo-oai · Pull Request #1646 · promptfoo/modelaudit

mldangelo-oai · 2026-06-11T01:20:37Z

Summary

Report gated or unauthorized Hugging Face acquisition failures as terminal success=false operational outcomes instead of claiming a successful scan with zero artifacts.
Add stable acquisition metadata/issue details for blocked vs transient acquisition errors, and keep text/JSON/exit-code semantics aligned.
Propagate ?revision= through Hugging Face listing, sizing, info, and worker paths so pinned acquisitions are evaluated at the requested revision.
Add focused CLI/source regression coverage plus a changelog entry.

Root Cause And Security Tradeoff

The CLI previously caught Hugging Face acquisition exceptions and only marked has_errors; the JSON payload could still report success=true, no issues, no file metadata, and no scanned assets. That made a gated acquisition look like a completed clean scan. The fix records acquisition failures as inconclusive operational source outcomes, sets success=false, preserves exit code 2, and avoids any completed-artifact claim.

The security tradeoff is deliberately fail-closed: if acquisition evidence is ambiguous or blocked, ModelAudit reports an operational acquisition error rather than silently treating the model as clean. The acquisition issue is informational, not a security finding, so successful acquisitions still scan normally and malicious findings remain visible.

Validation

uv run ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/test_cli.py -k "huggingface and (acquisition or gated or unauthorized or transient or malicious or benign or blocked or late_failure or encoded_revision)" --maxfail=1 -> 10 passed, 257 deselected
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/test_cli.py tests/utils/sources/test_huggingface.py tests/test_streaming_scan.py --maxfail=1 -> 624 passed, 4 skipped
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -n auto -m "not slow and not integration" --maxfail=1 -> 17389 passed, 1292 skipped
git diff --check

Real Hugging Face QA

All real-model commands were run with token variables scrubbed and telemetry/caches isolated:

env -u HF_TOKEN -u HUGGING_FACE_HUB_TOKEN -u HUGGINGFACE_HUB_TOKEN -u HUGGINGFACE_TOKEN -u HF_API_TOKEN HF_HOME=/tmp/modelaudit-task15-hfhome-final3-rank6 HF_HUB_DISABLE_IMPLICIT_TOKEN=1 HF_HUB_DISABLE_TELEMETRY=1 PROMPTFOO_DISABLE_TELEMETRY=1 NO_ANALYTICS=1 uv run modelaudit scan --quiet --no-cache --format json --timeout 30 "https://huggingface.co/black-forest-labs/FLUX.1-dev?revision=3de623fc3c33e44ffbe2bad470d0f45bccf2eb21"

Outcome: exit 2, success=false, has_errors=true, files_scanned=0, bytes_scanned=0, assets=0, issue_types=['huggingface_acquisition_error'], blocked=true, scan_outcome=inconclusive, scan_outcome_reason=huggingface_acquisition_blocked, requested_revision=3de623fc3c33e44ffbe2bad470d0f45bccf2eb21.

env -u HF_TOKEN -u HUGGING_FACE_HUB_TOKEN -u HUGGINGFACE_HUB_TOKEN -u HUGGINGFACE_TOKEN -u HF_API_TOKEN HF_HOME=/tmp/modelaudit-task15-hfhome-final3-rank13 HF_HUB_DISABLE_IMPLICIT_TOKEN=1 HF_HUB_DISABLE_TELEMETRY=1 PROMPTFOO_DISABLE_TELEMETRY=1 NO_ANALYTICS=1 uv run modelaudit scan --quiet --no-cache --format json --timeout 30 "https://huggingface.co/ideogram-ai/ideogram-4-fp8?revision=ee79a7237b519f1402ceacf952f30c8a31ec5073"

Outcome: exit 2, success=false, has_errors=true, files_scanned=0, bytes_scanned=0, assets=0, issue_types=['huggingface_acquisition_error'], blocked=true, scan_outcome=inconclusive, scan_outcome_reason=huggingface_acquisition_blocked, requested_revision=ee79a7237b519f1402ceacf952f30c8a31ec5073.

env -u HF_TOKEN -u HUGGING_FACE_HUB_TOKEN -u HUGGINGFACE_HUB_TOKEN -u HUGGINGFACE_TOKEN -u HF_API_TOKEN HF_HOME=/tmp/modelaudit-task15-hfhome-final3-rank15 HF_HUB_DISABLE_IMPLICIT_TOKEN=1 HF_HUB_DISABLE_TELEMETRY=1 PROMPTFOO_DISABLE_TELEMETRY=1 NO_ANALYTICS=1 uv run modelaudit scan --quiet --no-cache --format json --timeout 30 "https://huggingface.co/meta-llama/Meta-Llama-3-8B?revision=8cde5ca8380496c9a6cc7ef3a8b46a0372a1d920"

Outcome: exit 2, success=false, has_errors=true, files_scanned=0, bytes_scanned=0, assets=0, issue_types=['huggingface_acquisition_error'], blocked=true, scan_outcome=inconclusive, scan_outcome_reason=huggingface_acquisition_blocked, requested_revision=8cde5ca8380496c9a6cc7ef3a8b46a0372a1d920.

env -u HF_TOKEN -u HUGGING_FACE_HUB_TOKEN -u HUGGINGFACE_HUB_TOKEN -u HUGGINGFACE_TOKEN -u HF_API_TOKEN HF_HOME=/tmp/modelaudit-task15-hfhome-final3-rank21 HF_HUB_DISABLE_IMPLICIT_TOKEN=1 HF_HUB_DISABLE_TELEMETRY=1 PROMPTFOO_DISABLE_TELEMETRY=1 NO_ANALYTICS=1 uv run modelaudit scan --quiet --no-cache --format json --timeout 30 "https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct?revision=0e9e39f249a16976918f6564b8830bc894c89659"

Outcome: exit 2, success=false, has_errors=true, files_scanned=0, bytes_scanned=0, assets=0, issue_types=['huggingface_acquisition_error'], blocked=true, scan_outcome=inconclusive, scan_outcome_reason=huggingface_acquisition_blocked, requested_revision=0e9e39f249a16976918f6564b8830bc894c89659.

Public ungated control:

env -u HF_TOKEN -u HUGGING_FACE_HUB_TOKEN -u HUGGINGFACE_HUB_TOKEN -u HUGGINGFACE_TOKEN -u HF_API_TOKEN HF_HOME=/tmp/modelaudit-task15-hfhome-final3-public_file HF_HUB_DISABLE_IMPLICIT_TOKEN=1 HF_HUB_DISABLE_TELEMETRY=1 PROMPTFOO_DISABLE_TELEMETRY=1 NO_ANALYTICS=1 uv run modelaudit scan --quiet --no-cache --format json --timeout 60 --max-size 1MB "https://huggingface.co/sshleifer/tiny-gpt2/resolve/main/config.json"

Outcome: exit 0, success=true, has_errors=false, files_scanned=1, bytes_scanned=662, assets=1, issue_types=[], no acquisition-error metadata.

Malicious And Malformed Controls

Successful Hugging Face acquisition still reports security findings for malicious content in test_scan_huggingface_malicious_successful_acquisition_still_reports_security_findings.
A blocked Hugging Face source does not hide a separate local malicious finding in test_scan_huggingface_blocked_source_does_not_hide_local_malicious_findings.
Transient acquisition errors are not mislabeled as gated/blocked in test_scan_huggingface_transient_json_reports_acquisition_failed_not_blocked.
Stream failures after at least one artifact has been yielded are not rewritten as zero-artifact acquisition failures in test_scan_huggingface_streaming_late_failure_is_not_acquisition_error.
Ambiguous, empty, or unsafe Hugging Face revision query strings are rejected by parser tests.

No model weights, credentials, Hugging Face caches, or generated scan payloads are committed.

mldangelo-oai · 2026-06-11T01:20:57Z

@codex review

github-actions · 2026-06-11T01:22:38Z

Workflow run and artifacts

Performance Benchmarks

Compared 12 shared benchmarks with a regression threshold of 15%.
Status: 0 regressions, 0 improved, 12 stable, 0 new, 0 missing.
Aggregate shared-benchmark median: 1.469s -> 1.455s (-0.9%).

Workload	Benchmark	Target	Size	Files	Baseline	Current	Change	Status
`warm-cache-rescan`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_warm_cached_repository_rescan`	`release-candidate`	547.3 KiB	32	114.01ms	102.58ms	-10.0%	stable
`direct-malicious-upload`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_direct_malicious_upload`	`malicious_reduce`	52 B	1	450.8us	440.7us	-2.2%	stable
`suspicious-pickle-intake`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_suspicious_pickle_intake`	`suspicious-intake`	183.8 KiB	4	146.24ms	144.23ms	-1.4%	stable
`padded-multi-stream-upload`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_padded_multi_stream_upload`	`multi_stream_padded`	4.1 KiB	1	571.5us	564.3us	-1.3%	stable
`nested-payload-review`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_base64]`	`nested_base64`	98 B	1	520.8us	525.4us	+0.9%	stable
`duplicate-heavy-registry`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_duplicate_registry_snapshot`	`registry-snapshot`	915.2 KiB	13	415.80ms	412.48ms	-0.8%	stable
`nested-payload-review`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_raw]`	`nested_raw`	78 B	1	502.2us	505.7us	+0.7%	stable
`mixed-model-repository`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_release_candidate_repository`	`release-candidate`	547.3 KiB	32	484.29ms	486.73ms	+0.5%	stable
`chunked-upload-stream`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_chunked_upload_stream`	`chunked_stream`	278.2 KiB	1	116.49ms	116.99ms	+0.4%	stable
`single-checkpoint-preflight`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_single_checkpoint_before_load`	`single_checkpoint.pkl`	183.0 KiB	1	75.98ms	76.26ms	+0.4%	stable
`nested-payload-review`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_hex]`	`nested_hex`	130 B	1	549.4us	547.7us	-0.3%	stable
`clean-training-checkpoint`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_clean_training_checkpoint`	`safe_large`	278.2 KiB	1	113.34ms	113.40ms	+0.1%	stable

chatgpt-codex-connector · 2026-06-11T01:24:04Z

Codex Review: Didn't find any major issues. 🎉

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T02:06:23Z

@codex review

chatgpt-codex-connector · 2026-06-11T02:10:58Z

Codex Review: Didn't find any major issues. Keep them coming!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T02:16:38Z

Additional real-model QA case: meta-llama/Llama-3.2-1B is publicly enumerable but gated at artifact acquisition. On origin/main 8d6c4864, the CLI exits 2 while JSON reports success=true, has_errors=true, files_scanned=0, and no structured acquisition issue. Please include this exact case in the PR validation and confirm the new head returns a truthful stable blocked/acquisition outcome. @codex review

chatgpt-codex-connector · 2026-06-11T02:20:51Z

Codex Review: Didn't find any major issues. 🚀

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T02:40:26Z

Additional validation for the requested enumerable-but-gated case:

env -u HF_TOKEN -u HUGGING_FACE_HUB_TOKEN -u HUGGINGFACE_HUB_TOKEN -u HUGGINGFACE_TOKEN -u HF_API_TOKEN HF_HOME=/tmp/modelaudit-task15-hfhome-llama32-final HF_HUB_DISABLE_IMPLICIT_TOKEN=1 HF_HUB_DISABLE_TELEMETRY=1 PROMPTFOO_DISABLE_TELEMETRY=1 NO_ANALYTICS=1 uv run modelaudit scan --quiet --no-cache --format json --timeout 60 "https://huggingface.co/meta-llama/Llama-3.2-1B"

Outcome on head 34cbf523153064fb26ef94b714524ea9ddc7e952: exit 2, success=false, has_errors=true, files_scanned=0, bytes_scanned=0, assets=0, issue_types=["huggingface_acquisition_error"], blocked=true, scan_outcome=inconclusive, scan_outcome_reason=huggingface_acquisition_blocked. No artifacts were scanned and no credentials/caches/payloads were committed.

mldangelo-oai · 2026-06-11T02:54:39Z

Additional gated-repository QA: sesame/csm-1b@c92a71e1c419772e25be7dc14d952c2521a740ab exposes a 20-file/19.59 GB public inventory but returns 401 on paths-info content access. Exact main exits 2 with an explicit GatedRepoError and no findings. Add this mixed-public-metadata/private-content case to the truthful-acquisition regression matrix.

# Conflicts: # modelaudit/core.py

fix: fail gated Hugging Face acquisitions honestly

a15e81f

mldangelo-oai requested a review from mldangelo June 11, 2026 01:41

mldangelo-oai enabled auto-merge (squash) June 11, 2026 01:41

test: request text output for Hugging Face failure regression

34cbf52

mldangelo-oai disabled auto-merge June 11, 2026 02:16

mldangelo-oai enabled auto-merge (squash) June 11, 2026 02:23

mldangelo-oai disabled auto-merge June 11, 2026 02:26

mldangelo-oai enabled auto-merge (squash) June 11, 2026 03:04

mldangelo-oai added 6 commits June 11, 2026 16:21

Merge remote-tracking branch 'origin/main' into campaign/pr-1646

972fee6

fix: preserve interrupted streaming findings

b073658

Merge remote-tracking branch 'origin/main' into campaign/pr-1646

df52060

# Conflicts: # modelaudit/core.py

Merge remote-tracking branch 'origin/main' into campaign/pr-1646

aee9a14

test: pin streaming progress assertions to text output

59f9115

Merge remote-tracking branch 'origin/main' into campaign/pr-1646

1789e4e

mldangelo-oai merged commit 8192304 into main Jun 11, 2026
29 checks passed

mldangelo-oai deleted the mdangelo/codex/hf-fp-t15-hf-gated-result-semantics-20260610 branch June 11, 2026 18:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: fail gated Hugging Face acquisitions honestly#1646

fix: fail gated Hugging Face acquisitions honestly#1646
mldangelo-oai merged 8 commits into
mainfrom
mdangelo/codex/hf-fp-t15-hf-gated-result-semantics-20260610

mldangelo-oai commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mldangelo-oai commented Jun 11, 2026

Summary

Root Cause And Security Tradeoff

Validation

Real Hugging Face QA

Malicious And Malformed Controls

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance Benchmarks

Uh oh!

chatgpt-codex-connector Bot commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Jun 11, 2026 •

edited

Loading