fix(cli): make streaming dry runs non-executing by mldangelo-oai · Pull Request #1649 · promptfoo/modelaudit

mldangelo-oai · 2026-06-11T01:22:12Z

Summary

Adds a Hugging Face dry-run preview path for --dry-run --stream that performs metadata planning only.
Prevents artifact downloads, scanner execution, scan-result JSON formatting, and issue exit codes for successful Hugging Face streaming dry runs.
Fails closed on ambiguous/missing Hugging Face metadata and rejects mixed Hugging Face preview inputs plus local scan inputs to avoid surprising scanner execution during a dry run.

Root cause

scan_command passed dry_run=True through runtime config, but Hugging Face source resolution still entered the streaming acquisition path. For direct Hugging Face file URLs this downloaded the artifact, then invoked the normal scanner path and emitted full scan JSON. That made --dry-run --stream behave like a real scan for Hugging Face inputs.

Security tradeoffs

Successful Hugging Face dry runs now emit a preview-only payload with artifact_downloads: 0 and scanner_execution: false; they do not emit scan-result fields like bytes_scanned, files_scanned, issues, or checks.
Metadata lookup failures remain fail-closed with exit code 2 and do not fall through to acquisition or scanner execution.
Mixed Hugging Face preview inputs with local/non-Hugging Face inputs are rejected with exit code 2 instead of partially previewing and partially scanning.
Non-dry streaming scans still use the existing download and scanner execution paths, including issue exit code behavior.

Real-model QA

Pinned model/revision:

nvidia/nemotron-3.5-asr-streaming-0.6b @ 24b151a851dd15909e1fc611b11bb2da52b9fc81

Baseline on ab6e58ebe587197b2b46866c69f7d41d75e08a55 reproduced the bug with the bounded README artifact:

PROMPTFOO_DISABLE_TELEMETRY=1 NO_ANALYTICS=1 HF_HUB_DISABLE_PROGRESS_BARS=1 uv run modelaudit scan --dry-run --stream --format json --quiet --no-cache --scanners metadata --max-size 100KB https://huggingface.co/nvidia/nemotron-3.5-asr-streaming-0.6b/resolve/24b151a851dd15909e1fc611b11bb2da52b9fc81/README.md

Observed baseline behavior: exited 0 but issued artifact GET .../resolve-cache/.../README.md, invoked the metadata scanner, and emitted full scan JSON with files_scanned: 1, bytes_scanned: 47395, and scanner_names: ["metadata"].

Fixed dry-run command:

PROMPTFOO_DISABLE_TELEMETRY=1 NO_ANALYTICS=1 HF_HUB_DISABLE_PROGRESS_BARS=1 uv run modelaudit scan --dry-run --stream --format json --verbose --no-cache --scanners metadata --max-size 100KB --output /tmp/modelaudit-t16-preview.json https://huggingface.co/nvidia/nemotron-3.5-asr-streaming-0.6b/resolve/24b151a851dd15909e1fc611b11bb2da52b9fc81/README.md

Outcome: exit 0. Verbose output showed only:

GET https://huggingface.co/api/models/nvidia/nemotron-3.5-asr-streaming-0.6b/revision/24b151a851dd15909e1fc611b11bb2da52b9fc81
POST https://huggingface.co/api/models/nvidia/nemotron-3.5-asr-streaming-0.6b/paths-info/24b151a851dd15909e1fc611b11bb2da52b9fc81

Preview JSON contained artifact_downloads: 0, scanner_execution: false, metadata_only: true, size_bytes: 47395, and no scan-result fields.

Non-dry control:

PROMPTFOO_DISABLE_TELEMETRY=1 NO_ANALYTICS=1 HF_HUB_DISABLE_PROGRESS_BARS=1 uv run modelaudit scan --stream --format json --quiet --no-cache --scanners metadata --max-size 100KB https://huggingface.co/nvidia/nemotron-3.5-asr-streaming-0.6b/resolve/24b151a851dd15909e1fc611b11bb2da52b9fc81/README.md

Outcome: exit 0, normal scan JSON with files_scanned: 1, bytes_scanned: 47395, and scanner_names: ["metadata"].

Tests

uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
uv run ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/test_cli.py -k "huggingface_streaming_dry_run or huggingface_file_streaming_dry_run or huggingface_streaming_without_dry_run_still_reports_malicious_result" --maxfail=1
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/test_cli.py --maxfail=1
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/test_streaming_scan.py --maxfail=1
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/cache/test_cache_correctness.py::test_cached_scan_does_not_serialize_known_uncacheable_scan_result -vv --maxfail=1
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -n auto tests/cache/test_cache_correctness.py::test_cached_scan_does_not_serialize_known_uncacheable_scan_result -vv --maxfail=1
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/cache/test_cache_correctness.py --maxfail=1
git diff --check

Broad suite attempted:

PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -n auto -m "not slow and not integration" --maxfail=1

Outcome: failed after 10m33s in unrelated tests/cache/test_cache_correctness.py::test_cached_scan_does_not_serialize_known_uncacheable_scan_result (release_calls 2 vs 1). The exact test passes in isolation, under xdist isolation, and the containing cache module passes locally.

mldangelo-oai · 2026-06-11T01:22:38Z

@codex review

github-actions · 2026-06-11T01:24:22Z

Workflow run and artifacts

Performance Benchmarks

Compared 12 shared benchmarks with a regression threshold of 15%.
Status: 0 regressions, 0 improved, 12 stable, 0 new, 0 missing.
Aggregate shared-benchmark median: 1.425s -> 1.416s (-0.7%).

Workload	Benchmark	Target	Size	Files	Baseline	Current	Change	Status
`single-checkpoint-preflight`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_single_checkpoint_before_load`	`single_checkpoint.pkl`	183.0 KiB	1	70.92ms	72.33ms	+2.0%	stable
`mixed-model-repository`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_release_candidate_repository`	`release-candidate`	547.3 KiB	32	488.07ms	479.50ms	-1.8%	stable
`suspicious-pickle-intake`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_suspicious_pickle_intake`	`suspicious-intake`	183.8 KiB	4	145.63ms	143.25ms	-1.6%	stable
`padded-multi-stream-upload`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_padded_multi_stream_upload`	`multi_stream_padded`	4.1 KiB	1	643.9us	651.1us	+1.1%	stable
`nested-payload-review`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_raw]`	`nested_raw`	78 B	1	576.3us	570.9us	-0.9%	stable
`clean-training-checkpoint`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_clean_training_checkpoint`	`safe_large`	278.2 KiB	1	108.92ms	109.85ms	+0.9%	stable
`chunked-upload-stream`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_chunked_upload_stream`	`chunked_stream`	278.2 KiB	1	112.03ms	112.73ms	+0.6%	stable
`duplicate-heavy-registry`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_duplicate_registry_snapshot`	`registry-snapshot`	915.2 KiB	13	391.69ms	389.75ms	-0.5%	stable
`direct-malicious-upload`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_direct_malicious_upload`	`malicious_reduce`	52 B	1	518.6us	517.3us	-0.3%	stable
`nested-payload-review`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_base64]`	`nested_base64`	98 B	1	575.3us	576.7us	+0.2%	stable
`warm-cache-rescan`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_warm_cached_repository_rescan`	`release-candidate`	547.3 KiB	32	105.28ms	105.51ms	+0.2%	stable
`nested-payload-review`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_hex]`	`nested_hex`	130 B	1	608.5us	608.5us	-0.0%	stable

chatgpt-codex-connector · 2026-06-11T01:26:01Z

Codex Review: Didn't find any major issues. Delightful!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

fix(cli): make streaming dry runs non-executing

daf5d85

mldangelo-oai requested a review from mldangelo June 11, 2026 01:41

mldangelo-oai enabled auto-merge (squash) June 11, 2026 01:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(cli): make streaming dry runs non-executing#1649

fix(cli): make streaming dry runs non-executing#1649
mldangelo-oai wants to merge 1 commit into
mainfrom
mdangelo/codex/hf-fp-t16-streaming-dry-run-20260610

mldangelo-oai commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mldangelo-oai commented Jun 11, 2026

Summary

Root cause

Security tradeoffs

Real-model QA

Tests

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Performance Benchmarks

Uh oh!

chatgpt-codex-connector Bot commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant