Skip to content

fix(cli): make streaming dry runs non-executing#1649

Open
mldangelo-oai wants to merge 1 commit into
mainfrom
mdangelo/codex/hf-fp-t16-streaming-dry-run-20260610
Open

fix(cli): make streaming dry runs non-executing#1649
mldangelo-oai wants to merge 1 commit into
mainfrom
mdangelo/codex/hf-fp-t16-streaming-dry-run-20260610

Conversation

@mldangelo-oai

Copy link
Copy Markdown
Contributor

Summary

  • Adds a Hugging Face dry-run preview path for --dry-run --stream that performs metadata planning only.
  • Prevents artifact downloads, scanner execution, scan-result JSON formatting, and issue exit codes for successful Hugging Face streaming dry runs.
  • Fails closed on ambiguous/missing Hugging Face metadata and rejects mixed Hugging Face preview inputs plus local scan inputs to avoid surprising scanner execution during a dry run.

Root cause

scan_command passed dry_run=True through runtime config, but Hugging Face source resolution still entered the streaming acquisition path. For direct Hugging Face file URLs this downloaded the artifact, then invoked the normal scanner path and emitted full scan JSON. That made --dry-run --stream behave like a real scan for Hugging Face inputs.

Security tradeoffs

  • Successful Hugging Face dry runs now emit a preview-only payload with artifact_downloads: 0 and scanner_execution: false; they do not emit scan-result fields like bytes_scanned, files_scanned, issues, or checks.
  • Metadata lookup failures remain fail-closed with exit code 2 and do not fall through to acquisition or scanner execution.
  • Mixed Hugging Face preview inputs with local/non-Hugging Face inputs are rejected with exit code 2 instead of partially previewing and partially scanning.
  • Non-dry streaming scans still use the existing download and scanner execution paths, including issue exit code behavior.

Real-model QA

Pinned model/revision:

nvidia/nemotron-3.5-asr-streaming-0.6b @ 24b151a851dd15909e1fc611b11bb2da52b9fc81

Baseline on ab6e58ebe587197b2b46866c69f7d41d75e08a55 reproduced the bug with the bounded README artifact:

PROMPTFOO_DISABLE_TELEMETRY=1 NO_ANALYTICS=1 HF_HUB_DISABLE_PROGRESS_BARS=1 uv run modelaudit scan --dry-run --stream --format json --quiet --no-cache --scanners metadata --max-size 100KB https://huggingface.co/nvidia/nemotron-3.5-asr-streaming-0.6b/resolve/24b151a851dd15909e1fc611b11bb2da52b9fc81/README.md

Observed baseline behavior: exited 0 but issued artifact GET .../resolve-cache/.../README.md, invoked the metadata scanner, and emitted full scan JSON with files_scanned: 1, bytes_scanned: 47395, and scanner_names: ["metadata"].

Fixed dry-run command:

PROMPTFOO_DISABLE_TELEMETRY=1 NO_ANALYTICS=1 HF_HUB_DISABLE_PROGRESS_BARS=1 uv run modelaudit scan --dry-run --stream --format json --verbose --no-cache --scanners metadata --max-size 100KB --output /tmp/modelaudit-t16-preview.json https://huggingface.co/nvidia/nemotron-3.5-asr-streaming-0.6b/resolve/24b151a851dd15909e1fc611b11bb2da52b9fc81/README.md

Outcome: exit 0. Verbose output showed only:

  • GET https://huggingface.co/api/models/nvidia/nemotron-3.5-asr-streaming-0.6b/revision/24b151a851dd15909e1fc611b11bb2da52b9fc81
  • POST https://huggingface.co/api/models/nvidia/nemotron-3.5-asr-streaming-0.6b/paths-info/24b151a851dd15909e1fc611b11bb2da52b9fc81

Preview JSON contained artifact_downloads: 0, scanner_execution: false, metadata_only: true, size_bytes: 47395, and no scan-result fields.

Non-dry control:

PROMPTFOO_DISABLE_TELEMETRY=1 NO_ANALYTICS=1 HF_HUB_DISABLE_PROGRESS_BARS=1 uv run modelaudit scan --stream --format json --quiet --no-cache --scanners metadata --max-size 100KB https://huggingface.co/nvidia/nemotron-3.5-asr-streaming-0.6b/resolve/24b151a851dd15909e1fc611b11bb2da52b9fc81/README.md

Outcome: exit 0, normal scan JSON with files_scanned: 1, bytes_scanned: 47395, and scanner_names: ["metadata"].

Tests

  • uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
  • uv run ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
  • uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/test_cli.py -k "huggingface_streaming_dry_run or huggingface_file_streaming_dry_run or huggingface_streaming_without_dry_run_still_reports_malicious_result" --maxfail=1
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/test_cli.py --maxfail=1
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/test_streaming_scan.py --maxfail=1
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/cache/test_cache_correctness.py::test_cached_scan_does_not_serialize_known_uncacheable_scan_result -vv --maxfail=1
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -n auto tests/cache/test_cache_correctness.py::test_cached_scan_does_not_serialize_known_uncacheable_scan_result -vv --maxfail=1
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/cache/test_cache_correctness.py --maxfail=1
  • git diff --check

Broad suite attempted:

PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -n auto -m "not slow and not integration" --maxfail=1

Outcome: failed after 10m33s in unrelated tests/cache/test_cache_correctness.py::test_cached_scan_does_not_serialize_known_uncacheable_scan_result (release_calls 2 vs 1). The exact test passes in isolation, under xdist isolation, and the containing cache module passes locally.

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex review

@github-actions

Copy link
Copy Markdown
Contributor

Workflow run and artifacts

Performance Benchmarks

Compared 12 shared benchmarks with a regression threshold of 15%.
Status: 0 regressions, 0 improved, 12 stable, 0 new, 0 missing.
Aggregate shared-benchmark median: 1.425s -> 1.416s (-0.7%).

Workload Benchmark Target Size Files Baseline Current Change Status
single-checkpoint-preflight tests/benchmarks/test_scan_benchmarks.py::test_scan_single_checkpoint_before_load single_checkpoint.pkl 183.0 KiB 1 70.92ms 72.33ms +2.0% stable
mixed-model-repository tests/benchmarks/test_scan_benchmarks.py::test_scan_release_candidate_repository release-candidate 547.3 KiB 32 488.07ms 479.50ms -1.8% stable
suspicious-pickle-intake tests/benchmarks/test_scan_benchmarks.py::test_scan_suspicious_pickle_intake suspicious-intake 183.8 KiB 4 145.63ms 143.25ms -1.6% stable
padded-multi-stream-upload tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_padded_multi_stream_upload multi_stream_padded 4.1 KiB 1 643.9us 651.1us +1.1% stable
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_raw] nested_raw 78 B 1 576.3us 570.9us -0.9% stable
clean-training-checkpoint tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_clean_training_checkpoint safe_large 278.2 KiB 1 108.92ms 109.85ms +0.9% stable
chunked-upload-stream tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_chunked_upload_stream chunked_stream 278.2 KiB 1 112.03ms 112.73ms +0.6% stable
duplicate-heavy-registry tests/benchmarks/test_scan_benchmarks.py::test_scan_duplicate_registry_snapshot registry-snapshot 915.2 KiB 13 391.69ms 389.75ms -0.5% stable
direct-malicious-upload tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_direct_malicious_upload malicious_reduce 52 B 1 518.6us 517.3us -0.3% stable
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_base64] nested_base64 98 B 1 575.3us 576.7us +0.2% stable
warm-cache-rescan tests/benchmarks/test_scan_benchmarks.py::test_scan_warm_cached_repository_rescan release-candidate 547.3 KiB 32 105.28ms 105.51ms +0.2% stable
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_hex] nested_hex 130 B 1 608.5us 608.5us -0.0% stable

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Delightful!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@mldangelo-oai mldangelo-oai requested a review from mldangelo June 11, 2026 01:41
@mldangelo-oai mldangelo-oai enabled auto-merge (squash) June 11, 2026 01:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant