fix: preview Hugging Face dry-run scans by mldangelo-oai · Pull Request #1645 · promptfoo/modelaudit

mldangelo-oai · 2026-06-11T01:16:35Z

Summary

add Hugging Face direct-file and repository --dry-run previews that skip downloads and scans
add an opt-in pinned Hugging Face false-positive regression harness with synthetic malicious controls
document the user-visible dry-run behavior in the root [Unreleased] changelog

Review Notes

Independent security/regression review found no security regression in the diff.
Fixed one broad-suite regression found during review: modelaudit.cli.get_model_info now delegates through the Hugging Face source module so existing source-module patch points still affect metadata preview tests.
Subagent CSV files were inspected as orchestration artifacts and intentionally left untracked.

Validation

uv run ruff format modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
uv run ruff check --fix modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/test_huggingface_fp_regression_harness.py tests/test_cli.py::test_scan_huggingface_metadata_preview_escapes_model_id tests/test_cli.py::test_scan_huggingface_metadata_preflight_verbose_log_is_sanitized -q
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/test_cli.py -k "huggingface and (streaming or dry_run or scanner)" --maxfail=1 -q
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/utils/sources/test_huggingface.py --maxfail=1 -q
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -n auto -m "not slow and not integration" --maxfail=1 (17,373 passed, 1,292 skipped)
git diff --check

Live QA

PROMPTFOO_DISABLE_TELEMETRY=1 MODELAUDIT_RUN_HF_FP_LIVE=1 uv run pytest tests/test_huggingface_fp_regression_harness.py::test_live_pinned_hf_manifest_case_end_to_end -q --maxfail=1 (rank-2 live case passed, 6 full-matrix cases skipped)
PROMPTFOO_DISABLE_TELEMETRY=1 uv run modelaudit scan --dry-run --format json --scanners safetensors hf://sentence-transformers/all-MiniLM-L6-v2 (exit 0; 0 files scanned; download/scan skipped)

Add an opt-in Hugging Face false-positive regression harness with pinned live cases and synthetic malicious controls.

github-actions · 2026-06-11T01:18:40Z

Workflow run and artifacts

Performance Benchmarks

Compared 12 shared benchmarks with a regression threshold of 15%.
Status: 0 regressions, 0 improved, 12 stable, 0 new, 0 missing.
Aggregate shared-benchmark median: 1.462s -> 1.457s (-0.4%).

Workload	Benchmark	Target	Size	Files	Baseline	Current	Change	Status
`warm-cache-rescan`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_warm_cached_repository_rescan`	`release-candidate`	547.3 KiB	32	112.73ms	116.22ms	+3.1%	stable
`padded-multi-stream-upload`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_padded_multi_stream_upload`	`multi_stream_padded`	4.1 KiB	1	568.1us	575.9us	+1.4%	stable
`nested-payload-review`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_hex]`	`nested_hex`	130 B	1	543.6us	550.5us	+1.3%	stable
`mixed-model-repository`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_release_candidate_repository`	`release-candidate`	547.3 KiB	32	490.57ms	484.84ms	-1.2%	stable
`direct-malicious-upload`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_direct_malicious_upload`	`malicious_reduce`	52 B	1	452.3us	456.2us	+0.9%	stable
`chunked-upload-stream`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_chunked_upload_stream`	`chunked_stream`	278.2 KiB	1	115.57ms	114.62ms	-0.8%	stable
`clean-training-checkpoint`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_clean_training_checkpoint`	`safe_large`	278.2 KiB	1	112.74ms	111.85ms	-0.8%	stable
`nested-payload-review`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_raw]`	`nested_raw`	78 B	1	513.0us	509.6us	-0.7%	stable
`single-checkpoint-preflight`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_single_checkpoint_before_load`	`single_checkpoint.pkl`	183.0 KiB	1	74.09ms	74.47ms	+0.5%	stable
`duplicate-heavy-registry`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_duplicate_registry_snapshot`	`registry-snapshot`	915.2 KiB	13	410.53ms	409.12ms	-0.3%	stable
`nested-payload-review`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_base64]`	`nested_base64`	98 B	1	512.1us	511.1us	-0.2%	stable
`suspicious-pickle-intake`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_suspicious_pickle_intake`	`suspicious-intake`	183.8 KiB	4	143.14ms	143.09ms	-0.0%	stable

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 05fb6a454c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T01:30:13Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 086e576836

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T01:45:24Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 982f74dfa3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f178df9913

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T02:14:42Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b6b113e1a2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T02:51:20Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e2fdd6e7be

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T03:29:59Z

CI diagnosis for exact head e2fdd6e7: Windows-only failure in tests/test_huggingface_fp_regression_harness.py::test_hf_repo_dry_run_preview_does_not_download_or_scan; expected exit 0, got 2. Linux/Python 3.10/3.12/3.13 and all other checks pass. Please reproduce the Windows path handling, fix without weakening dry-run/no-download guarantees, add a focused cross-platform regression, push additively, and request a fresh exact-head review.

mldangelo-oai · 2026-06-11T03:39:28Z

@codex review

chatgpt-codex-connector · 2026-06-11T03:44:00Z

Codex Review: Didn't find any major issues. Hooray!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T04:15:32Z

Current head f78836c9 has one remaining CI failure on Windows:

tests/test_huggingface_fp_regression_harness.py::test_hf_repo_dry_run_preview_does_not_download_or_scan expects exit 0 but gets SystemExit(2).

Run/job: https://github.com/promptfoo/modelaudit/actions/runs/27322160248/job/80715503773

All other required jobs passed. @codex reproduce under Windows path/URL semantics, fix without weakening dry-run's no-download/no-scan invariant, add/adjust the cross-platform regression, push, and request exact-head review.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f78836c99a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T04:40:37Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0c3024f18f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T05:16:24Z

Independent adversarial review: promptfoo/modelaudit PR #1645

Findings

1. High — `--dry-run` still downloads remote model bytes during Hugging Face file selection

Files: modelaudit/cli.py:238-253, modelaudit/cli.py:268-294, modelaudit/utils/sources/huggingface.py:33-35, modelaudit/utils/sources/huggingface.py:160-218, modelaudit/utils/sources/huggingface.py:755-784, modelaudit/utils/sources/huggingface.py:1050-1067, CHANGELOG.md:12

The new preview calls the same content-routing selectors as real acquisition. Those selectors call _read_huggingface_prefix(), which issues authenticated-capable requests.get(..., Range=...) requests and consumes response bodies. The aggregate probe budget is 64 MiB across up to 256 candidates. This is not metadata-only behavior and directly contradicts both the preview's Download: skipped (--dry-run) output and the changelog claim that dry-run previews run "without downloading or scanning."

Exact-head adversarial probe: a repository manifest containing only renamed.payload caused two _read_huggingface_prefix calls before dry-run exited. The normal unit named test_hf_repo_dry_run_preview_does_not_download_or_scan does not catch this: its metadata omits revision, so _huggingface_preview_download_files() takes the suffix-only fallback, and it only mocks download_model/the local scanner, not the remote prefix reader. The separate content-routed test mocks the detector itself, also hiding the network read.

Impact: a command explicitly selected to avoid acquisition can read gated or sensitive files, incur up to tens of MiB of transfer, and fail on model-content endpoints. Keep preview metadata-only. When faithful routing would require content bytes, fail closed with an explicit incomplete-preview result rather than silently probing.

Validation: VALIDATED, confidence 99/100, introduced by this PR's dry-run call path.

2. High — The exact head still fails required Windows CI because another test replaces `modelaudit.cli`, bypassing the new tests' mocks

Files: modelaudit/cli.py:297-323, tests/test_cli_logging_handlers.py:6-20, tests/test_huggingface_fp_regression_harness.py:224-343

Heads e2fdd6e7 and f78836c9 failed Windows at test_hf_repo_dry_run_preview_does_not_download_or_scan with exit 2. Commit 0c3024f1 changes the basename expression from host-native Path to PurePosixPath, adds a separate backslash-path unit test before the failing test, and adds assertion diagnostics.

That production change cannot change the failed fixture's selection: its model.safetensors satisfies the first name_lower.endswith('.safetensors') operand, so Python short-circuits before evaluating either basename expression. A direct comparison returned the same selection under old Windows semantics and new POSIX semantics: ['model.safetensors'].

The completed exact-head run confirms the failure remains. Windows now fails test_hf_repo_dry_run_preview_rejects_unknown_selected_size_with_max_size: instead of using its mocked metadata, the test performs a real Hub request and receives 401. The job reports 1 failed, 14,578 passed, and 601 skipped; aggregate CI Success fails.

The root test-isolation mechanism is concrete. test_import_cli_does_not_configure_logging() removes modelaudit.cli from sys.modules and imports a replacement module without restoring the original. Tests collected earlier retain the old Click cli object and its old module globals, while later patch("modelaudit.cli.get_model_info") calls patch the replacement module. A local identity probe reproduced exactly this split: the module was replaced, the patch hit the new module, and it did not hit the old CLI globals. Xdist worker scheduling determines which new test receives the stale command object, explaining why successive Windows runs fail different tests.

The superseding POSIX-path commit neither fixes this isolation bug nor changes the original failed fixture's selection. Fix the logging-import test in a subprocess or restore the original module/package attribute, then run the focused harness and full Windows lane without live network access.

Validation: VALIDATED as a required-CI/test-isolation blocker, confidence 98/100. Current exact-head Windows status is recorded below.

3. Medium — `--timeout` is ignored by all Hugging Face dry-run metadata and content-selection work

Files: modelaudit/cli.py:238-253, modelaudit/cli.py:268-294, modelaudit/cli.py:343-359, modelaudit/utils/sources/huggingface.py:81-88, modelaudit/utils/sources/huggingface.py:1458-1516

_preview_huggingface_model_source() calls get_model_info(path) without runtime.timeout, then calls content selectors without a deadline. The underlying content probe defaults each request to 30 seconds and can inspect up to 256 files. --timeout 1 therefore does not bound the preview and can still take many minutes or longer across metadata, recursive tree listing, and content probes.

Exact-head probe: modelaudit scan --dry-run --timeout 1 --scanners safetensors hf://test/model called get_model_info('hf://test/model') with no timeout argument and exited 0.

Validation: VALIDATED, confidence 99/100. This confirms unresolved thread r3393330909.

4. Medium — README bytes are excluded from dry-run max-size budgeting but included by full download

Files: modelaudit/utils/sources/huggingface.py:1484-1504, modelaudit/cli.py:355-373, modelaudit/utils/model_extensions.py:19-39, modelaudit/scanner_registry_metadata.py:255-256

get_model_info() removes README.md from both the recursive listing and fallback sibling list. The real downloader's model-extension set includes .md, so _select_huggingface_model_files() selects README files. Dry-run can consequently report an under-budget success while the same non-dry command selects the README and exceeds --max-size.

Exact-head probe: metadata showing only a 20-byte model.safetensors passed --dry-run --max-size 50B; the real full-download selector for the actual repository list returned ['model.safetensors', 'README.md'].

Validation: VALIDATED, confidence 99/100. This confirms unresolved thread r3393330913.

5. Medium — Scanner-selected dry-runs reject content-routed files that full download selects

Files: modelaudit/cli.py:343-362, modelaudit/utils/sources/huggingface.py:755-789

For a non-streaming dry-run with --scanners, selected_files is suffix/basename-only and the function raises at line 354 before deriving the full downloader's content-routed selection. A renamed PyTorch ZIP can therefore produce preview exit 2 even though full download identifies and scans it.

Exact-head probe with renamed.payload and a PyTorch ZIP content route: scanner-selected dry-run exited 2, while _select_huggingface_model_files() returned ['renamed.payload'] for the non-dry path.

This cannot be corrected by unconditionally calling the existing content selector because that would preserve finding 1. The preview needs an explicit metadata-only/incomplete decision when content routing is required.

Validation: VALIDATED, confidence 98/100. This confirms unresolved thread r3393330907 and shows that resolved threads r3392977222/r3392860440 are only partially addressed.

6. Medium — Direct-file dry-run can succeed for a file the corresponding real scan rejects as unscannable

Files: modelaudit/cli.py:326-340, modelaudit/cli.py:2011-2028, modelaudit/cli.py:2894-2908

Direct-file preview validates existence and size but never applies the local explicit-file prefilter or active scanner routing. The original review example cover.png is not sufficient because the current real path counts unknown files and exits 0. The defect is nonetheless reproducible for explicit prefilter extensions: a valid direct script.py URL exits 0 in dry-run but, after download, the normal path skips it and exits 2 with no files scanned.

Exact-head comparison with identical mocked Hub metadata/download result:

script.py: normal exit 2, dry-run exit 0
cover.png: normal exit 0, dry-run exit 0
notes.txt: normal exit 0, dry-run exit 0

Validation: VALIDATED with a corrected reproducer, confidence 96/100. Unresolved thread r3393239824 is directionally valid, but its cover.png example and blanket scanner-selection rationale are not.

7. Medium — The pinned "end-to-end" matrix declares four modes but runs only one local-file path

Files: tests/assets/huggingface_false_positive_manifest.json:3-18, tests/test_huggingface_fp_regression_harness.py:99-133, tests/test_huggingface_fp_regression_harness.py:136-195, tests/test_huggingface_fp_regression_harness.py:593-636

Every case declares local-directory, streaming, dry-run, and scanner-selective; the manifest test merely asserts those strings exist. The opt-in live test downloads one trigger with hf_hub_download() and calls scan_model_directory_or_file() on that single local file. It never invokes the CLI dry-run path changed by this PR, never invokes scan_model_streaming(), never validates remote scanner-selection behavior, and never generates or scans the case's malicious control. Only rank 2's synthetic local control and one generic streaming fixture are exercised outside the live matrix.

The stated wall-time and download budgets are also not end-to-end: the timeout is applied only after the download, and max_download_bytes is asserted after the entire file has already been transferred.

Impact: the PR and manifest materially overstate coverage. A passing rank-2 live case provides no evidence for the new remote dry-run guarantees, streaming/full-download parity, or per-case malicious controls.

Validation: VALIDATED, confidence 99/100.

8. Medium — The global preview-success override silently changes existing cloud dry-run failures to success

Files: modelaudit/cli.py:2574-2600, modelaudit/cli.py:3261-3306, modelaudit/cli.py:4358-4363

This Hugging Face PR adds mark_dry_run_preview() to the pre-existing cloud preview path and lets any all-preview invocation bypass the normal no-files exit. No cloud validation checks that a directory contains a file, that selective filtering finds a scannable file, or that the target obeys the effective size limit before marking success.

Exact-head probe: an analyzed cloud directory with file_count=0 and no files exits 0. On the base code, the same path was not marked as a successful preview and fell through the documented no-files exit 2 contract. There is no changelog entry or regression test for this cross-provider behavior change.

Validation: VALIDATED, confidence 95/100, introduced by the changed cloud bookkeeping/final override.

9. Medium — Dry-run file names/sizes are not bound to the immutable revision used for content selection

Files: modelaudit/utils/sources/huggingface.py:1481-1513, modelaudit/cli.py:238-253, tests/utils/sources/test_huggingface.py:3102-3123

get_model_info() fetches model_info.sha, then calls list_repo_tree(repo_id, recursive=True) without passing that SHA. It returns the first call's SHA as metadata['revision'] while returning names and sizes from a second request against the mutable default revision. The preview then content-probes those names at the first SHA.

If the repository moves between calls, dry-run can probe a file that does not exist at the returned revision, omit a newly selected file, or enforce --max-size against a mixed manifest. The new unit test explicitly expects the unpinned list_repo_tree call. The real downloader instead obtains names and immutable SHA from one repository-info response.

Validation: VALIDATED, confidence 92/100, introduced as a dry-run correctness issue by combining the pre-existing metadata helper with revision-bound selectors.

Exact target and live state

PR: promptfoo/modelaudit#1645 — fix: preview Hugging Face dry-run scans
State: open, non-draft
Head SHA reviewed: 0c3024f18f10286d4a8649432837aa2cf2e68fac
Base SHA at initial live fetch: 8d6c4864fe2ea833ceaef1b9803d225afb1e8d69
Prior failing heads: e2fdd6e7bee56b63af89fcccf462629fb15edae0, then f78836c99aab6390c6476fcc07759827c896cc2b
GitHub mergeability at initial fetch: MERGEABLE, but mergeStateStatus=BLOCKED, reviewDecision=REVIEW_REQUIRED
Exact-head Python CI run: 27324248770
Final live head recheck: unchanged at 0c3024f18f10286d4a8649432837aa2cf2e68fac; base unchanged at 8d6c4864fe2ea833ceaef1b9803d225afb1e8d69.
Final exact-head CI: Linux/Python 3.10, 3.12, and 3.13, lint, type check, build/package, dependency audit, Docker, docs, benchmarks, and CodeQL passed. Windows Python 3.11 failed, so aggregate CI Success failed. Windows job: 80721846244 in run 27324248770.

Current review-thread reconciliation

Live GraphQL returned 19 threads at exact head: 13 resolved and 6 unresolved. Five unresolved threads still anchor to current code; the Windows path thread is outdated after 0c3024f1.

Thread	Live state	Independent disposition
`r3392638132` preserve issue exit	resolved/outdated	Fixed by the preview-only exit guard; malicious local dry-run test passes.
`r3392638135` recursive metadata	resolved/outdated	Fixed with recursive tree listing, subject to finding 9.
`r3392687672` preserve local no-files exit	resolved/current	Fixed; empty local dry-run exits 2.
`r3392687673` skip recursive folders	resolved/current	Fixed for current Hub `RepoFolder` objects, which lack `size`.
`r3392687676` structured stdout	resolved/outdated	Fixed; preview text is routed to stderr for JSON/SARIF stdout.
`r3392687679` validate direct-file existence	resolved/current	Fixed through repository/path metadata calls.
`r3392764356` reject direct folders	resolved/current	Fixed because folder metadata has no valid `size`.
`r3392764357` honor repo max-size	resolved/outdated	Partially fixed; finding 4 remains.
`r3392801228` use real file set for max-size	resolved/outdated	Partially fixed; findings 1, 4, and 9 remain.
`r3392801231` no scannable files	resolved/outdated	Fixed for unselected non-streaming previews.
`r3392860440` mirror full-download selection	resolved/outdated	Partially fixed; findings 1, 4, and 5 remain.
`r3392977222` content-routed selection	resolved/current	Partially fixed; finding 5 remains with scanner selection.
`r3392977223` streaming unfiltered limit	resolved/current	Fixed; the 129-file regression exits 2.
`r3393239818` avoid content probing	unresolved/current	Validated; finding 1.
`r3393239822` POSIX Hub paths	unresolved/outdated	Fixed by `PurePosixPath` and the new backslash-path regression, but unrelated to the prior failing test; see finding 2.
`r3393239824` unsupported direct files	unresolved/current	Partially validated with `script.py`; `cover.png` is not a valid reproducer; see finding 6.
`r3393330907` scanner-selected content routes	unresolved/current	Validated; finding 5.
`r3393330909` timeout propagation	unresolved/current	Validated; finding 3.
`r3393330913` README size budget	unresolved/current	Validated; finding 4.

Validation performed

Review inputs:

Fetched live PR metadata first and checked out the exact PR ref in an isolated detached worktree; the primary checkout was dirty and was not modified.
Read exact-head root AGENTS.md, the pr-review skill, the full changed production/test surface, surrounding source selectors/downloaders, all live comments/reviews/threads, commit history, and prior/current Windows job evidence.
git diff --check 8d6c4864...0c3024f1: passed.
No scoped AGENTS.md beyond root applies to the changed paths; no unambiguous policy violation was found.

Focused exact-head tests:

tests/test_huggingface_fp_regression_harness.py, two local dry-run exit tests, TestGetModelInfo, and TestGetHuggingFaceFileInfo: 23 passed, 7 skipped (the seven live Hub cases are opt-in).
tests/test_cli.py -k "huggingface and (streaming or dry_run or scanner)": 13 passed, 246 deselected.
tests/utils/sources/test_huggingface.py: 271 passed.
Focused xdist harness/exit subset: 18 passed, 7 skipped.

Adversarial exact-head probes:

Dry-run content route: two remote-prefix-read attempts observed.
Direct script.py: normal exit 2 vs dry-run exit 0.
README budget: dry-run exit 0 under 50 B while full selector includes README.md.
Renamed PyTorch route: scanner-selected dry-run exit 2 while full selector includes the file.
Timeout: --timeout 1 was not passed to metadata lookup.
Empty cloud directory preview: exit 0.
Old Windows basename semantics and new POSIX semantics selected the same failed-test fixture.
Replacing modelaudit.cli as done by test_cli_logging_handlers.py made string patches target the new module while the previously imported Click command retained old globals, reproducing the exact Windows mock bypass.

Broader validation:

A local broad fast-suite attempt using the primary checkout's existing virtualenv was stopped after native-package/cache failures unrelated to this diff (modelaudit-picklescan was not resolvable from the isolated exact-head worktree, causing scanner cascades). It reached 3,658 passes before interruption. These results are not used as PR findings.
The exact-head GitHub matrix is the broad-suite authority. It completed with only Windows and aggregate CI Success failing; the Windows failure is finding 2.
The pinned live Hub matrix was not independently run because huggingface.co is outside this environment's network allowlist. The PR body reports one rank-2 live case passed and six cases skipped, but that self-reported run exercises only the limited path described in finding 7.

Side-effect and malformed-input assessment

No new model-content disk write or cache/temp-directory creation was found before the Hugging Face dry-run branches return. Explicit --output/--sbom writes remain user-requested behavior.
The network non-execution guarantee is not met because content prefix bodies are fetched; metadata API calls and normal telemetry are separate network activity.
Missing direct files, direct folders, credential-bearing errors, recursive folders, unknown selected sizes under --max-size, local findings, and empty local paths have focused passing regressions.
Unsupported direct-file prefiltering, mutable-revision metadata, and malformed/empty cloud preview semantics remain incorrect as described above.

Merge disposition

Request changes / do not merge 0c3024f18f10286d4a8649432837aa2cf2e68fac.

Minimum merge gates:

Make Hugging Face dry-run metadata-only: no model-content GETs, downloads, scans, or hidden cache/temp writes. Fail closed when exact content routing cannot be known from metadata.
Propagate the end-to-end timeout and use one immutable repository manifest for names, sizes, and revision.
Match full-download/streaming size and selection semantics, including README files, scanner-selected renamed artifacts, and direct files rejected by the normal prefilter.
Replace declarative mode strings with actual pinned CLI dry-run, streaming, scanner-selective, local-directory, and malicious-control executions; bound acquisition before transfer.
Stop test_cli_logging_handlers.py from replacing the process-global CLI module (prefer a subprocess), make the new harness independent of worker ordering/live network, and require green exact-head Windows evidence.
Restore the existing cloud dry-run no-files/error contract or explicitly scope, document, and test the intended cross-provider change.

mldangelo-oai · 2026-06-11T05:36:34Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d0ac58c6e4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…t01-hf-fp-regression-harness-20260610

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e1baa07330

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…t01-hf-fp-regression-harness-20260610 # Conflicts: # modelaudit/cli.py # modelaudit/utils/sources/huggingface.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4fc88b3363

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…t01-hf-fp-regression-harness-20260610 # Conflicts: # modelaudit/cli.py # modelaudit/utils/sources/huggingface.py # tests/utils/sources/test_huggingface.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: aafefb118d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…t01-hf-fp-regression-harness-20260610 # Conflicts: # modelaudit/cli.py # tests/test_cli.py

…t01-hf-fp-regression-harness-20260610

…t01-hf-fp-regression-harness-20260610 # Conflicts: # modelaudit/cli.py # modelaudit/utils/sources/huggingface.py # tests/test_cli.py

…t01-hf-fp-regression-harness-20260610

…t01-hf-fp-regression-harness-20260610 # Conflicts: # modelaudit/utils/sources/_huggingface_download_worker.py

fix: preview Hugging Face dry-run scans

05fb6a4

Add an opt-in Hugging Face false-positive regression harness with pinned live cases and synthetic malicious controls.

chore: format Hugging Face FP manifest

5028931

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread modelaudit/cli.py Outdated

Comment thread modelaudit/cli.py Outdated

fix: address Hugging Face dry-run review

086e576

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread modelaudit/cli.py Outdated

Comment thread modelaudit/utils/sources/huggingface.py Outdated

Comment thread modelaudit/cli.py Outdated

Comment thread modelaudit/cli.py

fix: harden dry-run previews

982f74d

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread modelaudit/utils/sources/huggingface.py Outdated

Comment thread modelaudit/cli.py Outdated

fix: enforce dry-run preview limits

f178df9

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread modelaudit/cli.py Outdated

Comment thread modelaudit/cli.py Outdated

fix: fail closed on unknown HF dry-run sizes

b6b113e

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread modelaudit/cli.py Outdated

fix: align HF dry-run preview budgets

e2fdd6e

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread modelaudit/cli.py Outdated

Comment thread modelaudit/cli.py Outdated

fix: mirror HF dry-run routing limits

f78836c

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread modelaudit/cli.py Outdated

Comment thread modelaudit/cli.py Outdated

Comment thread modelaudit/cli.py Outdated

fix: stabilize HF dry-run preview paths

0c3024f

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread modelaudit/cli.py Outdated

Comment thread modelaudit/cli.py Outdated

Comment thread modelaudit/cli.py Outdated

test: isolate HF dry-run metadata mocks

631278b

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread modelaudit/cli.py Outdated

Comment thread modelaudit/cli.py Outdated

Comment thread modelaudit/cli.py Outdated

mldangelo-oai added 3 commits June 11, 2026 17:33

Merge remote-tracking branch 'origin/main' into mdangelo/codex/hf-fp-…

45edb1c

…t01-hf-fp-regression-harness-20260610

Merge remote-tracking branch 'origin/main' into mdangelo/codex/hf-fp-…

8d358bd

…t01-hf-fp-regression-harness-20260610

fix: align HF dry-run preview routing

e1baa07

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread modelaudit/cli.py Outdated

Comment thread modelaudit/cli.py Outdated

mldangelo-oai added 5 commits June 11, 2026 19:08

fix: prove HF config dry-run previews by filename

7d215d7

Merge remote-tracking branch 'origin/main' into mdangelo/codex/hf-fp-…

1489b0c

…t01-hf-fp-regression-harness-20260610 # Conflicts: # modelaudit/cli.py # modelaudit/utils/sources/huggingface.py

fix: budget HF stream dry-run content routes

02851db

fix: reject negative HF dry-run size caps

4fc88b3

fix: avoid unverified HF OpenVINO companions

c5ae147

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread modelaudit/cli.py Outdated

Comment thread modelaudit/cli.py Outdated

mldangelo-oai added 6 commits June 11, 2026 22:18

Merge remote-tracking branch 'origin/main' into mdangelo/codex/hf-fp-…

e32c5b5

…t01-hf-fp-regression-harness-20260610 # Conflicts: # modelaudit/cli.py # modelaudit/utils/sources/huggingface.py # tests/utils/sources/test_huggingface.py

fix: budget streaming dry-run include-all files

ed483b1

fix: fail closed for stream dry-run content routes

3af19e1

fix: bound hf stream dry-run content candidates

4fcb850

fix: mirror HF dry-run probe byte budget

9c37c70

fix: align OpenVINO dry-run companion suppression

aafefb1

chatgpt-codex-connector Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread modelaudit/cli.py Outdated

Comment thread modelaudit/cli.py Outdated

Comment thread modelaudit/cli.py Outdated

mldangelo-oai added 12 commits June 12, 2026 01:15

Merge remote-tracking branch 'origin/main' into mdangelo/codex/hf-fp-…

1606588

…t01-hf-fp-regression-harness-20260610 # Conflicts: # modelaudit/cli.py # tests/test_cli.py

fix: show budgeted HF stream dry-run candidates

318cf92

fix: skip unrelated safetensors shards in HF dry-run budget

41cca57

Merge remote-tracking branch 'origin/main' into mdangelo/codex/hf-fp-…

ffb5444

…t01-hf-fp-regression-harness-20260610

fix: align HF dry-run probe and gated budgets

8dea819

fix: report HF dry-run preview failures

a53cf2f

Merge remote-tracking branch 'origin/main' into mdangelo/codex/hf-fp-…

f099a92

…t01-hf-fp-regression-harness-20260610

Merge remote-tracking branch 'origin/main' into mdangelo/codex/hf-fp-…

cb59d25

…t01-hf-fp-regression-harness-20260610 # Conflicts: # modelaudit/cli.py # modelaudit/utils/sources/huggingface.py # tests/test_cli.py

fix: preserve HF dry-run previews on preview errors

783bb35

fix: show HF dry-run previews in text errors

8a56aa3

Merge remote-tracking branch 'origin/main' into mdangelo/codex/hf-fp-…

d37cda5

…t01-hf-fp-regression-harness-20260610

Merge remote-tracking branch 'origin/main' into mdangelo/codex/hf-fp-…

ff5f749

…t01-hf-fp-regression-harness-20260610 # Conflicts: # modelaudit/utils/sources/_huggingface_download_worker.py

Conversation

mldangelo-oai commented Jun 11, 2026

Summary

Review Notes

Validation

Live QA

Uh oh!

github-actions Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance Benchmarks

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 11, 2026 •

edited

Loading

1. High — `--dry-run` still downloads remote model bytes during Hugging Face file selection

2. High — The exact head still fails required Windows CI because another test replaces `modelaudit.cli`, bypassing the new tests' mocks

3. Medium — `--timeout` is ignored by all Hugging Face dry-run metadata and content-selection work