Skip to content

fix: preview Hugging Face dry-run scans#1645

Open
mldangelo-oai wants to merge 44 commits into
mainfrom
mdangelo/codex/hf-fp-t01-hf-fp-regression-harness-20260610
Open

fix: preview Hugging Face dry-run scans#1645
mldangelo-oai wants to merge 44 commits into
mainfrom
mdangelo/codex/hf-fp-t01-hf-fp-regression-harness-20260610

Conversation

@mldangelo-oai

Copy link
Copy Markdown
Contributor

Summary

  • add Hugging Face direct-file and repository --dry-run previews that skip downloads and scans
  • add an opt-in pinned Hugging Face false-positive regression harness with synthetic malicious controls
  • document the user-visible dry-run behavior in the root [Unreleased] changelog

Review Notes

  • Independent security/regression review found no security regression in the diff.
  • Fixed one broad-suite regression found during review: modelaudit.cli.get_model_info now delegates through the Hugging Face source module so existing source-module patch points still affect metadata preview tests.
  • Subagent CSV files were inspected as orchestration artifacts and intentionally left untracked.

Validation

  • uv run ruff format modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
  • uv run ruff check --fix modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
  • uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/test_huggingface_fp_regression_harness.py tests/test_cli.py::test_scan_huggingface_metadata_preview_escapes_model_id tests/test_cli.py::test_scan_huggingface_metadata_preflight_verbose_log_is_sanitized -q
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/test_cli.py -k "huggingface and (streaming or dry_run or scanner)" --maxfail=1 -q
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/utils/sources/test_huggingface.py --maxfail=1 -q
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -n auto -m "not slow and not integration" --maxfail=1 (17,373 passed, 1,292 skipped)
  • git diff --check

Live QA

  • PROMPTFOO_DISABLE_TELEMETRY=1 MODELAUDIT_RUN_HF_FP_LIVE=1 uv run pytest tests/test_huggingface_fp_regression_harness.py::test_live_pinned_hf_manifest_case_end_to_end -q --maxfail=1 (rank-2 live case passed, 6 full-matrix cases skipped)
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run modelaudit scan --dry-run --format json --scanners safetensors hf://sentence-transformers/all-MiniLM-L6-v2 (exit 0; 0 files scanned; download/scan skipped)

Add an opt-in Hugging Face false-positive regression harness with pinned live cases and synthetic malicious controls.
@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Workflow run and artifacts

Performance Benchmarks

Compared 12 shared benchmarks with a regression threshold of 15%.
Status: 0 regressions, 0 improved, 12 stable, 0 new, 0 missing.
Aggregate shared-benchmark median: 1.462s -> 1.457s (-0.4%).

Workload Benchmark Target Size Files Baseline Current Change Status
warm-cache-rescan tests/benchmarks/test_scan_benchmarks.py::test_scan_warm_cached_repository_rescan release-candidate 547.3 KiB 32 112.73ms 116.22ms +3.1% stable
padded-multi-stream-upload tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_padded_multi_stream_upload multi_stream_padded 4.1 KiB 1 568.1us 575.9us +1.4% stable
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_hex] nested_hex 130 B 1 543.6us 550.5us +1.3% stable
mixed-model-repository tests/benchmarks/test_scan_benchmarks.py::test_scan_release_candidate_repository release-candidate 547.3 KiB 32 490.57ms 484.84ms -1.2% stable
direct-malicious-upload tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_direct_malicious_upload malicious_reduce 52 B 1 452.3us 456.2us +0.9% stable
chunked-upload-stream tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_chunked_upload_stream chunked_stream 278.2 KiB 1 115.57ms 114.62ms -0.8% stable
clean-training-checkpoint tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_clean_training_checkpoint safe_large 278.2 KiB 1 112.74ms 111.85ms -0.8% stable
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_raw] nested_raw 78 B 1 513.0us 509.6us -0.7% stable
single-checkpoint-preflight tests/benchmarks/test_scan_benchmarks.py::test_scan_single_checkpoint_before_load single_checkpoint.pkl 183.0 KiB 1 74.09ms 74.47ms +0.5% stable
duplicate-heavy-registry tests/benchmarks/test_scan_benchmarks.py::test_scan_duplicate_registry_snapshot registry-snapshot 915.2 KiB 13 410.53ms 409.12ms -0.3% stable
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_base64] nested_base64 98 B 1 512.1us 511.1us -0.2% stable
suspicious-pickle-intake tests/benchmarks/test_scan_benchmarks.py::test_scan_suspicious_pickle_intake suspicious-intake 183.8 KiB 4 143.14ms 143.09ms -0.0% stable

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 05fb6a454c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/cli.py Outdated
Comment thread modelaudit/cli.py Outdated
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 086e576836

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/cli.py Outdated
Comment thread modelaudit/utils/sources/huggingface.py Outdated
Comment thread modelaudit/cli.py Outdated
Comment thread modelaudit/cli.py
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 982f74dfa3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/utils/sources/huggingface.py Outdated
Comment thread modelaudit/cli.py Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f178df9913

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/cli.py Outdated
Comment thread modelaudit/cli.py Outdated
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b6b113e1a2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/cli.py Outdated
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e2fdd6e7be

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/cli.py Outdated
Comment thread modelaudit/cli.py Outdated
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

CI diagnosis for exact head e2fdd6e7: Windows-only failure in tests/test_huggingface_fp_regression_harness.py::test_hf_repo_dry_run_preview_does_not_download_or_scan; expected exit 0, got 2. Linux/Python 3.10/3.12/3.13 and all other checks pass. Please reproduce the Windows path handling, fix without weakening dry-run/no-download guarantees, add a focused cross-platform regression, push additively, and request a fresh exact-head review.

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Hooray!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown
Contributor Author

Current head f78836c9 has one remaining CI failure on Windows:

tests/test_huggingface_fp_regression_harness.py::test_hf_repo_dry_run_preview_does_not_download_or_scan expects exit 0 but gets SystemExit(2).

Run/job: https://github.com/promptfoo/modelaudit/actions/runs/27322160248/job/80715503773

All other required jobs passed. @codex reproduce under Windows path/URL semantics, fix without weakening dry-run's no-download/no-scan invariant, add/adjust the cross-platform regression, push, and request exact-head review.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f78836c99a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/cli.py Outdated
Comment thread modelaudit/cli.py Outdated
Comment thread modelaudit/cli.py Outdated
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0c3024f18f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/cli.py Outdated
Comment thread modelaudit/cli.py Outdated
Comment thread modelaudit/cli.py Outdated
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Independent adversarial review: promptfoo/modelaudit PR #1645

Findings

1. High — --dry-run still downloads remote model bytes during Hugging Face file selection

Files: modelaudit/cli.py:238-253, modelaudit/cli.py:268-294, modelaudit/utils/sources/huggingface.py:33-35, modelaudit/utils/sources/huggingface.py:160-218, modelaudit/utils/sources/huggingface.py:755-784, modelaudit/utils/sources/huggingface.py:1050-1067, CHANGELOG.md:12

The new preview calls the same content-routing selectors as real acquisition. Those selectors call _read_huggingface_prefix(), which issues authenticated-capable requests.get(..., Range=...) requests and consumes response bodies. The aggregate probe budget is 64 MiB across up to 256 candidates. This is not metadata-only behavior and directly contradicts both the preview's Download: skipped (--dry-run) output and the changelog claim that dry-run previews run "without downloading or scanning."

Exact-head adversarial probe: a repository manifest containing only renamed.payload caused two _read_huggingface_prefix calls before dry-run exited. The normal unit named test_hf_repo_dry_run_preview_does_not_download_or_scan does not catch this: its metadata omits revision, so _huggingface_preview_download_files() takes the suffix-only fallback, and it only mocks download_model/the local scanner, not the remote prefix reader. The separate content-routed test mocks the detector itself, also hiding the network read.

Impact: a command explicitly selected to avoid acquisition can read gated or sensitive files, incur up to tens of MiB of transfer, and fail on model-content endpoints. Keep preview metadata-only. When faithful routing would require content bytes, fail closed with an explicit incomplete-preview result rather than silently probing.

Validation: VALIDATED, confidence 99/100, introduced by this PR's dry-run call path.

2. High — The exact head still fails required Windows CI because another test replaces modelaudit.cli, bypassing the new tests' mocks

Files: modelaudit/cli.py:297-323, tests/test_cli_logging_handlers.py:6-20, tests/test_huggingface_fp_regression_harness.py:224-343

Heads e2fdd6e7 and f78836c9 failed Windows at test_hf_repo_dry_run_preview_does_not_download_or_scan with exit 2. Commit 0c3024f1 changes the basename expression from host-native Path to PurePosixPath, adds a separate backslash-path unit test before the failing test, and adds assertion diagnostics.

That production change cannot change the failed fixture's selection: its model.safetensors satisfies the first name_lower.endswith('.safetensors') operand, so Python short-circuits before evaluating either basename expression. A direct comparison returned the same selection under old Windows semantics and new POSIX semantics: ['model.safetensors'].

The completed exact-head run confirms the failure remains. Windows now fails test_hf_repo_dry_run_preview_rejects_unknown_selected_size_with_max_size: instead of using its mocked metadata, the test performs a real Hub request and receives 401. The job reports 1 failed, 14,578 passed, and 601 skipped; aggregate CI Success fails.

The root test-isolation mechanism is concrete. test_import_cli_does_not_configure_logging() removes modelaudit.cli from sys.modules and imports a replacement module without restoring the original. Tests collected earlier retain the old Click cli object and its old module globals, while later patch("modelaudit.cli.get_model_info") calls patch the replacement module. A local identity probe reproduced exactly this split: the module was replaced, the patch hit the new module, and it did not hit the old CLI globals. Xdist worker scheduling determines which new test receives the stale command object, explaining why successive Windows runs fail different tests.

The superseding POSIX-path commit neither fixes this isolation bug nor changes the original failed fixture's selection. Fix the logging-import test in a subprocess or restore the original module/package attribute, then run the focused harness and full Windows lane without live network access.

Validation: VALIDATED as a required-CI/test-isolation blocker, confidence 98/100. Current exact-head Windows status is recorded below.

3. Medium — --timeout is ignored by all Hugging Face dry-run metadata and content-selection work

Files: modelaudit/cli.py:238-253, modelaudit/cli.py:268-294, modelaudit/cli.py:343-359, modelaudit/utils/sources/huggingface.py:81-88, modelaudit/utils/sources/huggingface.py:1458-1516

_preview_huggingface_model_source() calls get_model_info(path) without runtime.timeout, then calls content selectors without a deadline. The underlying content probe defaults each request to 30 seconds and can inspect up to 256 files. --timeout 1 therefore does not bound the preview and can still take many minutes or longer across metadata, recursive tree listing, and content probes.

Exact-head probe: modelaudit scan --dry-run --timeout 1 --scanners safetensors hf://test/model called get_model_info('hf://test/model') with no timeout argument and exited 0.

Validation: VALIDATED, confidence 99/100. This confirms unresolved thread r3393330909.

4. Medium — README bytes are excluded from dry-run max-size budgeting but included by full download

Files: modelaudit/utils/sources/huggingface.py:1484-1504, modelaudit/cli.py:355-373, modelaudit/utils/model_extensions.py:19-39, modelaudit/scanner_registry_metadata.py:255-256

get_model_info() removes README.md from both the recursive listing and fallback sibling list. The real downloader's model-extension set includes .md, so _select_huggingface_model_files() selects README files. Dry-run can consequently report an under-budget success while the same non-dry command selects the README and exceeds --max-size.

Exact-head probe: metadata showing only a 20-byte model.safetensors passed --dry-run --max-size 50B; the real full-download selector for the actual repository list returned ['model.safetensors', 'README.md'].

Validation: VALIDATED, confidence 99/100. This confirms unresolved thread r3393330913.

5. Medium — Scanner-selected dry-runs reject content-routed files that full download selects

Files: modelaudit/cli.py:343-362, modelaudit/utils/sources/huggingface.py:755-789

For a non-streaming dry-run with --scanners, selected_files is suffix/basename-only and the function raises at line 354 before deriving the full downloader's content-routed selection. A renamed PyTorch ZIP can therefore produce preview exit 2 even though full download identifies and scans it.

Exact-head probe with renamed.payload and a PyTorch ZIP content route: scanner-selected dry-run exited 2, while _select_huggingface_model_files() returned ['renamed.payload'] for the non-dry path.

This cannot be corrected by unconditionally calling the existing content selector because that would preserve finding 1. The preview needs an explicit metadata-only/incomplete decision when content routing is required.

Validation: VALIDATED, confidence 98/100. This confirms unresolved thread r3393330907 and shows that resolved threads r3392977222/r3392860440 are only partially addressed.

6. Medium — Direct-file dry-run can succeed for a file the corresponding real scan rejects as unscannable

Files: modelaudit/cli.py:326-340, modelaudit/cli.py:2011-2028, modelaudit/cli.py:2894-2908

Direct-file preview validates existence and size but never applies the local explicit-file prefilter or active scanner routing. The original review example cover.png is not sufficient because the current real path counts unknown files and exits 0. The defect is nonetheless reproducible for explicit prefilter extensions: a valid direct script.py URL exits 0 in dry-run but, after download, the normal path skips it and exits 2 with no files scanned.

Exact-head comparison with identical mocked Hub metadata/download result:

  • script.py: normal exit 2, dry-run exit 0
  • cover.png: normal exit 0, dry-run exit 0
  • notes.txt: normal exit 0, dry-run exit 0

Validation: VALIDATED with a corrected reproducer, confidence 96/100. Unresolved thread r3393239824 is directionally valid, but its cover.png example and blanket scanner-selection rationale are not.

7. Medium — The pinned "end-to-end" matrix declares four modes but runs only one local-file path

Files: tests/assets/huggingface_false_positive_manifest.json:3-18, tests/test_huggingface_fp_regression_harness.py:99-133, tests/test_huggingface_fp_regression_harness.py:136-195, tests/test_huggingface_fp_regression_harness.py:593-636

Every case declares local-directory, streaming, dry-run, and scanner-selective; the manifest test merely asserts those strings exist. The opt-in live test downloads one trigger with hf_hub_download() and calls scan_model_directory_or_file() on that single local file. It never invokes the CLI dry-run path changed by this PR, never invokes scan_model_streaming(), never validates remote scanner-selection behavior, and never generates or scans the case's malicious control. Only rank 2's synthetic local control and one generic streaming fixture are exercised outside the live matrix.

The stated wall-time and download budgets are also not end-to-end: the timeout is applied only after the download, and max_download_bytes is asserted after the entire file has already been transferred.

Impact: the PR and manifest materially overstate coverage. A passing rank-2 live case provides no evidence for the new remote dry-run guarantees, streaming/full-download parity, or per-case malicious controls.

Validation: VALIDATED, confidence 99/100.

8. Medium — The global preview-success override silently changes existing cloud dry-run failures to success

Files: modelaudit/cli.py:2574-2600, modelaudit/cli.py:3261-3306, modelaudit/cli.py:4358-4363

This Hugging Face PR adds mark_dry_run_preview() to the pre-existing cloud preview path and lets any all-preview invocation bypass the normal no-files exit. No cloud validation checks that a directory contains a file, that selective filtering finds a scannable file, or that the target obeys the effective size limit before marking success.

Exact-head probe: an analyzed cloud directory with file_count=0 and no files exits 0. On the base code, the same path was not marked as a successful preview and fell through the documented no-files exit 2 contract. There is no changelog entry or regression test for this cross-provider behavior change.

Validation: VALIDATED, confidence 95/100, introduced by the changed cloud bookkeeping/final override.

9. Medium — Dry-run file names/sizes are not bound to the immutable revision used for content selection

Files: modelaudit/utils/sources/huggingface.py:1481-1513, modelaudit/cli.py:238-253, tests/utils/sources/test_huggingface.py:3102-3123

get_model_info() fetches model_info.sha, then calls list_repo_tree(repo_id, recursive=True) without passing that SHA. It returns the first call's SHA as metadata['revision'] while returning names and sizes from a second request against the mutable default revision. The preview then content-probes those names at the first SHA.

If the repository moves between calls, dry-run can probe a file that does not exist at the returned revision, omit a newly selected file, or enforce --max-size against a mixed manifest. The new unit test explicitly expects the unpinned list_repo_tree call. The real downloader instead obtains names and immutable SHA from one repository-info response.

Validation: VALIDATED, confidence 92/100, introduced as a dry-run correctness issue by combining the pre-existing metadata helper with revision-bound selectors.

Exact target and live state

  • PR: promptfoo/modelaudit#1645fix: preview Hugging Face dry-run scans
  • State: open, non-draft
  • Head SHA reviewed: 0c3024f18f10286d4a8649432837aa2cf2e68fac
  • Base SHA at initial live fetch: 8d6c4864fe2ea833ceaef1b9803d225afb1e8d69
  • Prior failing heads: e2fdd6e7bee56b63af89fcccf462629fb15edae0, then f78836c99aab6390c6476fcc07759827c896cc2b
  • GitHub mergeability at initial fetch: MERGEABLE, but mergeStateStatus=BLOCKED, reviewDecision=REVIEW_REQUIRED
  • Exact-head Python CI run: 27324248770
  • Final live head recheck: unchanged at 0c3024f18f10286d4a8649432837aa2cf2e68fac; base unchanged at 8d6c4864fe2ea833ceaef1b9803d225afb1e8d69.
  • Final exact-head CI: Linux/Python 3.10, 3.12, and 3.13, lint, type check, build/package, dependency audit, Docker, docs, benchmarks, and CodeQL passed. Windows Python 3.11 failed, so aggregate CI Success failed. Windows job: 80721846244 in run 27324248770.

Current review-thread reconciliation

Live GraphQL returned 19 threads at exact head: 13 resolved and 6 unresolved. Five unresolved threads still anchor to current code; the Windows path thread is outdated after 0c3024f1.

Thread Live state Independent disposition
r3392638132 preserve issue exit resolved/outdated Fixed by the preview-only exit guard; malicious local dry-run test passes.
r3392638135 recursive metadata resolved/outdated Fixed with recursive tree listing, subject to finding 9.
r3392687672 preserve local no-files exit resolved/current Fixed; empty local dry-run exits 2.
r3392687673 skip recursive folders resolved/current Fixed for current Hub RepoFolder objects, which lack size.
r3392687676 structured stdout resolved/outdated Fixed; preview text is routed to stderr for JSON/SARIF stdout.
r3392687679 validate direct-file existence resolved/current Fixed through repository/path metadata calls.
r3392764356 reject direct folders resolved/current Fixed because folder metadata has no valid size.
r3392764357 honor repo max-size resolved/outdated Partially fixed; finding 4 remains.
r3392801228 use real file set for max-size resolved/outdated Partially fixed; findings 1, 4, and 9 remain.
r3392801231 no scannable files resolved/outdated Fixed for unselected non-streaming previews.
r3392860440 mirror full-download selection resolved/outdated Partially fixed; findings 1, 4, and 5 remain.
r3392977222 content-routed selection resolved/current Partially fixed; finding 5 remains with scanner selection.
r3392977223 streaming unfiltered limit resolved/current Fixed; the 129-file regression exits 2.
r3393239818 avoid content probing unresolved/current Validated; finding 1.
r3393239822 POSIX Hub paths unresolved/outdated Fixed by PurePosixPath and the new backslash-path regression, but unrelated to the prior failing test; see finding 2.
r3393239824 unsupported direct files unresolved/current Partially validated with script.py; cover.png is not a valid reproducer; see finding 6.
r3393330907 scanner-selected content routes unresolved/current Validated; finding 5.
r3393330909 timeout propagation unresolved/current Validated; finding 3.
r3393330913 README size budget unresolved/current Validated; finding 4.

Validation performed

Review inputs:

  • Fetched live PR metadata first and checked out the exact PR ref in an isolated detached worktree; the primary checkout was dirty and was not modified.
  • Read exact-head root AGENTS.md, the pr-review skill, the full changed production/test surface, surrounding source selectors/downloaders, all live comments/reviews/threads, commit history, and prior/current Windows job evidence.
  • git diff --check 8d6c4864...0c3024f1: passed.
  • No scoped AGENTS.md beyond root applies to the changed paths; no unambiguous policy violation was found.

Focused exact-head tests:

  • tests/test_huggingface_fp_regression_harness.py, two local dry-run exit tests, TestGetModelInfo, and TestGetHuggingFaceFileInfo: 23 passed, 7 skipped (the seven live Hub cases are opt-in).
  • tests/test_cli.py -k "huggingface and (streaming or dry_run or scanner)": 13 passed, 246 deselected.
  • tests/utils/sources/test_huggingface.py: 271 passed.
  • Focused xdist harness/exit subset: 18 passed, 7 skipped.

Adversarial exact-head probes:

  • Dry-run content route: two remote-prefix-read attempts observed.
  • Direct script.py: normal exit 2 vs dry-run exit 0.
  • README budget: dry-run exit 0 under 50 B while full selector includes README.md.
  • Renamed PyTorch route: scanner-selected dry-run exit 2 while full selector includes the file.
  • Timeout: --timeout 1 was not passed to metadata lookup.
  • Empty cloud directory preview: exit 0.
  • Old Windows basename semantics and new POSIX semantics selected the same failed-test fixture.
  • Replacing modelaudit.cli as done by test_cli_logging_handlers.py made string patches target the new module while the previously imported Click command retained old globals, reproducing the exact Windows mock bypass.

Broader validation:

  • A local broad fast-suite attempt using the primary checkout's existing virtualenv was stopped after native-package/cache failures unrelated to this diff (modelaudit-picklescan was not resolvable from the isolated exact-head worktree, causing scanner cascades). It reached 3,658 passes before interruption. These results are not used as PR findings.
  • The exact-head GitHub matrix is the broad-suite authority. It completed with only Windows and aggregate CI Success failing; the Windows failure is finding 2.
  • The pinned live Hub matrix was not independently run because huggingface.co is outside this environment's network allowlist. The PR body reports one rank-2 live case passed and six cases skipped, but that self-reported run exercises only the limited path described in finding 7.

Side-effect and malformed-input assessment

  • No new model-content disk write or cache/temp-directory creation was found before the Hugging Face dry-run branches return. Explicit --output/--sbom writes remain user-requested behavior.
  • The network non-execution guarantee is not met because content prefix bodies are fetched; metadata API calls and normal telemetry are separate network activity.
  • Missing direct files, direct folders, credential-bearing errors, recursive folders, unknown selected sizes under --max-size, local findings, and empty local paths have focused passing regressions.
  • Unsupported direct-file prefiltering, mutable-revision metadata, and malformed/empty cloud preview semantics remain incorrect as described above.

Merge disposition

Request changes / do not merge 0c3024f18f10286d4a8649432837aa2cf2e68fac.

Minimum merge gates:

  1. Make Hugging Face dry-run metadata-only: no model-content GETs, downloads, scans, or hidden cache/temp writes. Fail closed when exact content routing cannot be known from metadata.
  2. Propagate the end-to-end timeout and use one immutable repository manifest for names, sizes, and revision.
  3. Match full-download/streaming size and selection semantics, including README files, scanner-selected renamed artifacts, and direct files rejected by the normal prefilter.
  4. Replace declarative mode strings with actual pinned CLI dry-run, streaming, scanner-selective, local-directory, and malicious-control executions; bound acquisition before transfer.
  5. Stop test_cli_logging_handlers.py from replacing the process-global CLI module (prefer a subprocess), make the new harness independent of worker ordering/live network, and require green exact-head Windows evidence.
  6. Restore the existing cloud dry-run no-files/error contract or explicitly scope, document, and test the intended cross-provider change.

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d0ac58c6e4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/cli.py Outdated
Comment thread modelaudit/cli.py Outdated
Comment thread modelaudit/cli.py Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e1baa07330

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/cli.py Outdated
Comment thread modelaudit/cli.py Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4fc88b3363

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/cli.py Outdated
Comment thread modelaudit/cli.py Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: aafefb118d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/cli.py Outdated
Comment thread modelaudit/cli.py Outdated
Comment thread modelaudit/cli.py Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant