fix(pickle): require source proof for framework metadata by mldangelo-oai · Pull Request #1644 · promptfoo/modelaudit

mldangelo-oai · 2026-06-11T01:14:51Z

Summary

Fixes false-positive task 10 for installed, source-trusted framework metadata only. Inert Hugging Face / Accelerate / Torch training metadata can be suppressed when the referenced implementation resolves from trusted installed source or an expected extension owner and the executable surface has been inspected. Default and standalone installs without optional frameworks intentionally retain NON_ALLOWLISTED_GLOBAL warnings; unresolved-name trust is not restored.

Security contract

Do not import missing optional frameworks during scan.
Do not trust unresolved framework names by module/name alone.
Loaded site-package references are trusted only when the live executable identity is tied to trusted source or a known extension export owner. If a trusted framework module was imported and rebound before scanner initialization, scans remain suspicious.
Shadowed, rebound, source-unavailable, nested, concatenated, memo-aliased, NEWOBJ_EX, slot-state BUILD, and EXT1/EXT2/EXT4 controls remain detected or suspicious.

Real-model QA

Pinned nvidia/LocateAnything-3B@272068e81a31e88a48ea03c20a09decba2b62ed6/training_args.bin remains the target benign source-trusted metadata case. The default/missing-framework profile is now explicitly out of scope for clean suppression and should warn rather than fail open.

Validation

Python 3.10 focused standalone/root import-order and unresolved-framework controls passed.
Current Python focused standalone/root framework metadata, scan/load divergence, safe NumPy, memo/concat/extension controls passed.
uv run ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ passed.
uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ passed.
uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ --cache-dir /tmp/modelaudit-mypy-cache-pr1644-rebind passed.
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest packages/modelaudit-picklescan/tests -q passed: 2382 passed, 9 skipped.
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_pytorch_zip_scanner.py -q passed: 852 passed, 4 skipped.
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -n auto -m "not slow and not integration" --maxfail=1 passed: 18313 passed, 837 skipped.

Allow exact Hugging Face and Accelerate training metadata references observed in training_args.bin while keeping executable REDUCE, extension, shadow-constructor, and near-match controls detected.

mldangelo-oai · 2026-06-11T01:16:18Z

/codex review

github-actions · 2026-06-11T01:17:00Z

Workflow run and artifacts

Performance Benchmarks

Compared 13 shared benchmarks with a regression threshold of 15%.
Status: 0 regressions, 0 improved, 13 stable, 0 new, 0 missing.
Aggregate shared-benchmark median: 4.108s -> 4.093s (-0.4%).

Workload	Benchmark	Target	Size	Files	Baseline	Current	Change	Status
`clean-training-checkpoint`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_clean_training_checkpoint`	`safe_large`	278.2 KiB	1	108.67ms	110.13ms	+1.3%	stable
`warm-cache-rescan`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_warm_cached_repository_rescan`	`release-candidate`	547.3 KiB	32	144.47ms	146.13ms	+1.1%	stable
`nested-payload-review`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_hex]`	`nested_hex`	130 B	1	613.9us	620.8us	+1.1%	stable
`rejected-basic-auth-candidates`	`tests/benchmarks/test_scan_benchmarks.py::test_rejected_basic_auth_candidates_scan_linearly`	`-`	371.1 KiB	1	2.451s	2.425s	-1.1%	stable
`chunked-upload-stream`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_chunked_upload_stream`	`chunked_stream`	278.2 KiB	1	111.88ms	113.07ms	+1.1%	stable
`mixed-model-repository`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_release_candidate_repository`	`release-candidate`	547.3 KiB	32	555.24ms	560.63ms	+1.0%	stable
`nested-payload-review`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_raw]`	`nested_raw`	78 B	1	583.6us	578.8us	-0.8%	stable
`nested-payload-review`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_base64]`	`nested_base64`	98 B	1	591.6us	595.4us	+0.6%	stable
`suspicious-pickle-intake`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_suspicious_pickle_intake`	`suspicious-intake`	183.8 KiB	4	161.92ms	162.71ms	+0.5%	stable
`direct-malicious-upload`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_direct_malicious_upload`	`malicious_reduce`	52 B	1	524.8us	523.9us	-0.2%	stable
`single-checkpoint-preflight`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_single_checkpoint_before_load`	`single_checkpoint.pkl`	183.0 KiB	1	87.00ms	87.10ms	+0.1%	stable
`duplicate-heavy-registry`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_duplicate_registry_snapshot`	`registry-snapshot`	915.2 KiB	13	485.31ms	485.80ms	+0.1%	stable
`padded-multi-stream-upload`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_padded_multi_stream_upload`	`multi_stream_padded`	4.1 KiB	1	646.2us	646.7us	+0.1%	stable

mldangelo-oai · 2026-06-11T01:19:36Z

@codex review

chatgpt-codex-connector · 2026-06-11T01:23:24Z

Codex Review: Didn't find any major issues. Can't wait for the next one!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T01:32:05Z

Independent review of current head eadf7c7ddd5aac08a5499820fbd94b24b8daf2f3 found two actionable issues:

P1: unresolved framework names can produce a false-clean executable reconstruction result. call_graph.py treats new metadata references as trusted even when the framework is unavailable in the scanner environment, and api.py then suppresses findings for exact NEWOBJ/NEWOBJ_EX/BUILD and selected REDUCE invocations. The regression test currently forces resolution to "unresolved" and requires COMPLETE/CLEAN, which locks in the unsafe behavior. A same-name transformers.training_args.TrainingArguments class supplied only in the later unpickle environment can execute attacker-controlled __setstate__ while the earlier scan remains clean. Do not suppress invoked unresolved references to clean. Require source-backed trusted origin plus safe call-graph analysis, or retain a suspicious verdict when the implementation cannot be inspected.

Required regressions: scan while the framework is unresolved, then unpickle in an isolated child containing the exact shadow class and require a non-clean scan; cover standalone scan_file() and root PyTorchZipScanner; protocols 4/5 STACK_GLOBAL, memo aliases, NEWOBJ_EX, slot-state BUILD, nested and concatenated streams; and EXT2/EXT4 controls in addition to EXT1.
P3: missing root changelog. The PR changes root PyTorchZipScanner behavior for training_args.bin, but only updates packages/modelaudit-picklescan/CHANGELOG.md. Add the corresponding user-visible entry to root CHANGELOG.md.

Please address both findings, push a new head, resolve this feedback with concrete test evidence, and request a new exact-head Codex review.

mldangelo-oai · 2026-06-11T02:20:45Z

Addressed the independent review findings at exact head 0f9fa3e82d93193176db8a9d519e925bace168df.

P1: unresolved framework metadata now fails closed instead of suppressing executable reconstruction findings. call_graph.py no longer treats unresolved framework reconstruction/metadata references as trusted import-only references, and api.py now requires invoked trusted references to have safe source-backed analysis before suppressing NON_ALLOWLISTED_GLOBAL findings. Benign compatibility is retained only for inspected/validated paths, including the existing joblib raw ndarray handling after bounded parser validation.

Regressions added for scan/load environment divergence through standalone scan_file() and root PyTorchZipScanner, covering protocol 4 and 5 STACK_GLOBAL, memo aliases, NEWOBJ_EX, slot-state BUILD, nested streams, concatenated streams, and EXT1/EXT2/EXT4 controls. Each divergence case scans with the framework unresolved, asserts a non-clean suspicious result, then unpickles in an isolated child with a same-name shadow class and verifies the shadow __setstate__ execution marker.

P3: added the root CHANGELOG.md entry for the user-visible root PyTorch ZIP scanner behavior change, alongside the picklescan package changelog update.

Local validation:

uv run ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ passed.
uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ passed.
uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ passed.
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest packages/modelaudit-picklescan/tests/test_api.py -k "unresolved_framework or framework_metadata or scan_load_divergence or selected_hf_framework_metadata_reduce" --maxfail=1 passed: 34 passed.
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_pytorch_zip_scanner.py -k "training_args_unresolved or scan_load_divergence" --maxfail=1 passed: 11 passed.
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest packages/modelaudit-picklescan/tests/test_api.py --maxfail=1 passed: 667 passed.
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_pytorch_zip_scanner.py --maxfail=1 passed: 845 passed, 1 skipped.

I also attempted the full non-slow/non-integration xdist lane. It stopped at tests/cache/test_cache_correctness.py::test_cached_scan_does_not_serialize_known_uncacheable_scan_result with release_calls == 2; that same node passed both isolated and isolated-with-xdist reruns, so I am treating it as an unrelated order-dependent local failure while waiting on remote CI.

mldangelo-oai · 2026-06-11T02:20:51Z

@codex review

Please review exact head 0f9fa3e82d93193176db8a9d519e925bace168df.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0f9fa3e82d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T02:38:41Z

Update for exact head ea25b1424e1d99e8df1d4bd053db62a832499936:

I fixed the CI failure from the first push as well. The stricter suppression initially made reviewed interpreter-owned stdlib reconstruction references such as _frozen_importlib.ModuleSpec, string.Formatter, and weakref.ref/proxy suspicious when their source is not analyzable. The follow-up commit keeps the no-analysis shortcut only for those reviewed non-framework references. Framework reconstruction/metadata and framework REDUCE references still require source-backed analysis; unresolved framework origins remain suspicious.

Additional validation after the fixup:

uv run --python 3.13 --with pytest --with pytest-xdist pytest tests/test_call_graph_import_statements.py::test_unresolved_framework_reconstruction_reference_requires_origin_review tests/test_call_graph_import_statements.py::test_scan_bytes_keeps_frozen_stdlib_globals_clean_without_custom_meta_path_finders tests/test_call_graph_import_statements.py::test_scan_bytes_blocks_formatter_vformat_defaultdict_factory_rce tests/test_call_graph_import_statements.py::test_scan_bytes_keeps_formatter_vformat_defaultdict_without_live_fields_clean tests/test_call_graph_import_statements.py::test_scan_bytes_blocks_formatter_private_vformat_defaultdict_factory_rce tests/test_call_graph_import_statements.py::test_scan_bytes_keeps_formatter_private_vformat_defaultdict_without_live_fields_clean tests/test_call_graph_import_statements.py::test_scan_bytes_keeps_weakref_without_callback_clean --tb=short -vv passed: 8 passed.
uv run --python 3.13 --with pytest --with pytest-xdist pytest -n auto tests --tb=short from packages/modelaudit-picklescan passed: 2298 passed, 83 skipped.
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest packages/modelaudit-picklescan/tests/test_api.py -k "unresolved_framework or framework_metadata or scan_load_divergence or selected_hf_framework_metadata_reduce" --maxfail=1 passed: 34 passed.
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_pytorch_zip_scanner.py -k "training_args_unresolved or scan_load_divergence" --maxfail=1 passed: 11 passed.
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_pytorch_zip_scanner.py --maxfail=1 passed: 845 passed, 1 skipped.
uv run ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ passed.
uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ passed.
uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ passed.

The earlier full root non-slow/non-integration local xdist attempt still stopped on tests/cache/test_cache_correctness.py::test_cached_scan_does_not_serialize_known_uncacheable_scan_result with release_calls == 2; that node passed isolated and isolated-with-xdist reruns. Remote CI is now rerunning on the new head.

mldangelo-oai · 2026-06-11T02:38:49Z

@codex review

Please review exact head ea25b1424e1d99e8df1d4bd053db62a832499936.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ea25b1424e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T03:05:38Z

Additional exact pinned QA: google-bert/bert-large-uncased@6da4b6a26a1877e173fca3225479512db81a5e5b contains canonical-looking rust_model.ot TorchScript structure: __torch__.Module, GLOBAL/NEWOBJ/BUILD, torch.jit._pickle.build_intlist, and REDUCE, with valid ZIP paths/ratios and no malicious import identified. Include this as a source-backed benign/suspicious reconstruction control. Parent/member hash attribution is being fixed separately in task 39.

mldangelo-oai · 2026-06-11T03:32:07Z

Update for exact head 6bfce7abf1ba7d653fe3afc90b4d0e022e895a99:

Addressed the additional feedback posted after the previous push:

Added a pinned google-bert/bert-large-uncased@6da4b6a26a1877e173fca3225479512db81a5e5b synthetic rust_model.ot TorchScript QA control. It includes valid TorchScript ZIP layout, generated __torch__.py plus debug pair, and archive/data.pkl with __torch__.Module GLOBAL/NEWOBJ/BUILD plus torch.jit._pickle.build_intlist REDUCE. The test asserts warning/suspicious reconstruction findings and no critical/malicious finding for the source-backed generated Python path.
Addressed Codex P2 by keeping joblib.numpy_pickle.NumpyArrayWrapper origin-review findings unless import_only_reference_is_proven_trusted("joblib.numpy_pickle", "NumpyArrayWrapper") succeeds. Added an untrusted-origin regression for a valid raw ndarray span.
Applied the same trusted-origin gate in the root pickle legitimate-serialization helper.
Removed the unused constants reported by code-quality.

Additional local validation for this head:

uv run ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ passed.
uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ passed.
uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ passed.
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_joblib_scanner_codecs.py tests/scanners/test_pickle_scanner.py::test_legitimate_serialization_file_uses_rust_scan tests/scanners/test_pickle_scanner.py::test_legitimate_serialization_file_keeps_untrusted_wrapper_origin_review tests/scanners/test_pickle_scanner.py::test_legitimate_serialization_file_skips_call_graph_enrichment tests/scanners/test_pickle_scanner.py::test_legitimate_serialization_file_keeps_bounded_file_reads --maxfail=1 passed: 70 passed.
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest packages/modelaudit-picklescan/tests/test_api.py -k "unresolved_framework or framework_metadata or scan_load_divergence or selected_hf_framework_metadata_reduce" --maxfail=1 passed: 34 passed.
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_pytorch_zip_scanner.py --maxfail=1 passed: 846 passed, 1 skipped.
uv run --python 3.13 --with pytest --with pytest-xdist pytest -n auto tests --tb=short from packages/modelaudit-picklescan passed: 2298 passed, 83 skipped.

mldangelo-oai · 2026-06-11T03:32:14Z

@codex review

Please review exact head 6bfce7abf1ba7d653fe3afc90b4d0e022e895a99.

chatgpt-codex-connector · 2026-06-11T03:35:32Z

Codex Review: Didn't find any major issues. You're on a roll.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T03:37:44Z

Pinned TorchScript QA input: sentence-transformers/all-MiniLM-L12-v2@a50ef00143b4d5391434df20ae11632588ac25be, file rust_model.ot. Current main flags standard-looking GLOBAL/NEWOBJ/BUILD plus __torch__.Module even though nested execution evidence is false and the archive contains generated __torch__.py. Please classify every contained pickle member independently, preserve the worst substream outcome in archive aggregation, and prove genuine external/import-time execution remains critical.

mldangelo-oai · 2026-06-11T03:57:18Z

Update for exact head 24a4f334ef42cdf2db7e1f879ee2dab51a16c7d7:

Fixed the root Python 3.13 CI failure in test_scan_keeps_untrusted_numpy_wrapper_origin_review_after_valid_raw_array. The joblib test helper now patches both root and standalone picklescan origin-review functions so trusted/untrusted joblib.numpy_pickle.NumpyArrayWrapper behavior is deterministic even when CI has joblib installed. The scanner also clears the wrapper-specific call_graph_source_unavailable info notice only when the wrapper origin is trusted and the raw ndarray span was statically validated; untrusted wrapper origin-review findings remain intact.

Validation after this fixup:

PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_joblib_scanner_codecs.py --maxfail=1 passed: 66 passed.
uv run ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ passed.
uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ passed.
uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ passed.

CI is rerunning on this new head.

mldangelo-oai · 2026-06-11T03:57:26Z

@codex review

Please review exact head 24a4f334ef42cdf2db7e1f879ee2dab51a16c7d7.

chatgpt-codex-connector · 2026-06-11T04:01:49Z

Codex Review: Didn't find any major issues. Another round soon, please!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T04:12:13Z

Pinned real-model framework-metadata QA: SupraLabs/Supra-50M-Reasoning (rank 208; exact revision is in the coordinator scan metadata) emits S201 plus nine warning NON_ALLOWLISTED_GLOBAL findings from standard training_args.bin globals: TrainingArguments, SchedulerType, OptimizerNames, IntervalStrategy, SaveStrategy, HubStrategy, AcceleratorConfig, PartialState, and DistributedType. @codex test the exact training_args.bin on the live head and suppress only inert import-only framework metadata; preserve findings when those globals participate in REDUCE/NEWOBJ/BUILD execution, carry attacker arguments/state, or lead to non-framework callables.

…framework-metadata-20260610' into campaign/pr-1644

mldangelo-oai · 2026-06-12T20:21:45Z

@codex review Please review exact head 8523f02.

chatgpt-codex-connector · 2026-06-12T20:27:04Z

Codex Review: Didn't find any major issues. Keep it up!

Reviewed commit: 8523f02192

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…framework-metadata-20260610' into mdangelo/codex/hf-fp-t10-pickle-framework-metadata-20260610 # Conflicts: # tests/scanners/test_pickle_scanner.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b748f27177

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…framework-metadata-20260610' into campaign/pr-1644

mldangelo-oai · 2026-06-12T21:54:04Z

@codex review Please review exact head f72ac2d.

chatgpt-codex-connector · 2026-06-12T21:58:14Z

Codex Review: Didn't find any major issues. Can't wait for the next one!

Reviewed commit: f72ac2d353

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…framework-metadata-20260610' into mdangelo/codex/hf-fp-t10-pickle-framework-metadata-20260610 # Conflicts: # modelaudit/scanners/joblib_scanner.py # packages/modelaudit-picklescan/src/modelaudit_picklescan/api.py

mldangelo-oai · 2026-06-12T23:22:46Z

Addressed the latest live review-thread items at exact head 8a1a322cad2b9d708ce647cc5b88d6813d8d30b0.

What changed in this round:

PyTorch storage PID trust is bound to the actual matched GLOBAL / STACK_GLOBAL position used inside the validated BINPERSID tuple, including nested/member position_offset; unmatched same-name storage globals have stale pytorch_storage_persistent_id metadata cleared.
Joblib validated raw-array cleanup removes matching ACTIONABLE_FAILED_CHECKS private evidence along with public checks/issues, and validated no-position source-unavailable notices no longer keep cleaned benign results out of cache.
NumPy object-dtype reconstruction cleanup now removes matching private failed-check evidence when pruning validated reconstruction findings, so cleaned benign .npy scans are cacheable.
Merged the published PR branch additively and confirmed current origin/main was already included before final validation.

Thread handling:

Resolved the fixed NumPy private-evidence thread PRRT_kwDOOEde4c6JQN8A.
The storage PID and Joblib private-evidence threads were already resolved by the published branch update and remain fixed at this head.
Left PRRT_kwDOOEde4c6IsS7X unresolved intentionally because this PR keeps default/standalone missing-framework installs suspicious rather than restoring unresolved-name trust.

Final local validation on 8a1a322cad2b9d708ce647cc5b88d6813d8d30b0:

uv run ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ -> passed.
uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ -> 422 files already formatted.
uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ --cache-dir /tmp/modelaudit-mypy-cache-pr1644-numpy-private -> success, 477 source files.
Current Python focused standalone/root storage/rebinding/memo/extension/nested/Joblib controls -> 121 passed, 1889 deselected.
Python 3.10 focused standalone/root storage/rebinding/memo/extension/nested/Joblib controls -> 121 passed, 1889 deselected.
Current Python targeted NumPy private-evidence controls -> 2 passed; Python 3.10 same -> 2 passed.
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_numpy_scanner.py -q -> 63 passed.
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest packages/modelaudit-picklescan/tests -q -> 2510 passed, 3 skipped.
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_pytorch_zip_scanner.py --maxfail=1 -q -> 975 passed, 8 skipped.
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -n auto -m "not slow and not integration" --maxfail=1 -> 20780 passed, 785 skipped, 40 warnings.
git diff --check -> passed.

@codex review exact head 8a1a322cad2b9d708ce647cc5b88d6813d8d30b0

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8a1a322cad

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…framework-metadata-20260610' into campaign/pr-1644 # Conflicts: # modelaudit/scanners/numpy_scanner.py

…t10-pickle-framework-metadata-20260610

mldangelo-oai · 2026-06-13T01:00:01Z

Addressed the exact-head Joblib no-position cleanup blocker at ce1ad41ab2e6a2b28a293166d44ee0747b29646c.

What changed:

NON_ALLOWLISTED_GLOBAL cleanup now treats a no-position issue/check pair with the same signature as one occurrence, but counts distinct no-position signatures as separate occurrences. If only occurrence 1 is validated, occurrence 2 remains suspicious and private actionable evidence remains.
call_graph_source_unavailable cleanup is separated from actionable occurrence matching and is pruned only from origin-review proof; ambiguous duplicate no-position source-unavailable notices stay fail-closed.
Merged current origin/main additively through 8ba33406 after the fix.

Exact-head validation on ce1ad41ab2e6a2b28a293166d44ee0747b29646c:

uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ -> 422 files already formatted.
uv run ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ -> passed.
uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ --cache-dir /tmp/modelaudit-mypy-cache-pr1644-merge-joblib-no-position -> success, 477 source files.
Current Python focused standalone/root pickle/PyTorch/Joblib selection -> 295 passed, 1787 deselected.
Python 3.10 same focused selection -> 295 passed, 1787 deselected.
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_joblib_scanner_codecs.py -q -> 73 passed on current Python; 73 passed on Python 3.10.
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest packages/modelaudit-picklescan/tests -q -> 2510 passed, 3 skipped.
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_pytorch_zip_scanner.py --maxfail=1 -q -> 975 passed, 8 skipped.
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -n auto -m "not slow and not integration" --maxfail=1 -> 20781 passed, 785 skipped, 40 warnings.
git diff --check -> passed.

Live thread state after push: the Joblib no-position review item is no longer a live unresolved thread; only PRRT_kwDOOEde4c6IsS7X remains unresolved intentionally because this PR keeps default/standalone missing-framework installs suspicious instead of restoring unresolved-name trust.

@codex review exact head ce1ad41ab2e6a2b28a293166d44ee0747b29646c

…framework-metadata-20260610' into campaign/pr-1644

chatgpt-codex-connector · 2026-06-13T01:04:51Z

Codex Review: Didn't find any major issues. Hooray!

Reviewed commit: ce1ad41ab2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-13T01:41:11Z

@codex review Please review exact head d80e0fb. Local validation passed: focused private-evidence/storage regressions, full ruff format check, full ruff check, full mypy, and PROMPTFOO_DISABLE_TELEMETRY=1 pytest -n auto -m "not slow and not integration" --maxfail=1 with 20770 passed / 799 skipped. Independent reviewer also found no blockers on this exact head.

chatgpt-codex-connector · 2026-06-13T01:46:17Z

Codex Review: Didn't find any major issues. Another round soon, please!

Reviewed commit: d80e0fb6f0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

fix(pickle): allow inert framework metadata globals

560d35d

Allow exact Hugging Face and Accelerate training metadata references observed in training_args.bin while keeping executable REDUCE, extension, shadow-constructor, and near-match controls detected.

github-code-quality Bot found potential problems Jun 11, 2026

View reviewed changes

Comment thread packages/modelaudit-picklescan/tests/test_api.py Fixed

test(pickle): remove unused metadata fixture set

eadf7c7

fix: fail closed on unresolved pickle metadata reconstruction

0f9fa3e

github-code-quality Bot found potential problems Jun 11, 2026

View reviewed changes

Comment thread packages/modelaudit-picklescan/tests/test_api.py Fixed

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread modelaudit/scanners/joblib_scanner.py Outdated

fix: preserve stdlib pickle reconstruction trust

ea25b14

github-code-quality Bot found potential problems Jun 11, 2026

View reviewed changes

Comment thread packages/modelaudit-picklescan/src/modelaudit_picklescan/api.py Fixed

chatgpt-codex-connector Bot reviewed Jun 11, 2026

View reviewed changes

Comment thread modelaudit/scanners/joblib_scanner.py Outdated

fix: keep joblib origin review findings

6bfce7a

github-code-quality Bot found potential problems Jun 11, 2026

View reviewed changes

Comment thread packages/modelaudit-picklescan/src/modelaudit_picklescan/api.py Fixed

test: make joblib origin review deterministic

24a4f33

Merge remote-tracking branch 'origin/mdangelo/codex/hf-fp-t10-pickle-…

8523f02

…framework-metadata-20260610' into campaign/pr-1644

mldangelo-oai added 3 commits June 12, 2026 20:40

fix: trust legacy pytorch storage origin findings by offset

abf3249

Merge remote-tracking branch 'origin/mdangelo/codex/hf-fp-t10-pickle-…

b748f27

…framework-metadata-20260610' into mdangelo/codex/hf-fp-t10-pickle-framework-metadata-20260610 # Conflicts: # tests/scanners/test_pickle_scanner.py

fix(pickle): bind storage metadata cleanup

0e8f910

chatgpt-codex-connector Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread modelaudit/scanners/numpy_scanner.py

Merge remote-tracking branch 'origin/mdangelo/codex/hf-fp-t10-pickle-…

f72ac2d

…framework-metadata-20260610' into campaign/pr-1644

mldangelo-oai added 5 commits June 12, 2026 22:13

fix: bind storage trust and joblib cleanup evidence

a7e333e

fix(numpy): clear pruned private findings

2b7fa9a

fix(scanners): align private finding pruning

a10272a

fix: clear numpy cleanup private evidence

8a1a322

chatgpt-codex-connector Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread modelaudit/scanners/joblib_scanner.py Outdated

mldangelo-oai added 6 commits June 12, 2026 23:56

fix(scanners): preserve pruned private evidence

ff40774

Merge remote-tracking branch 'origin/main' into campaign/pr-1644

974f0cd

Merge remote-tracking branch 'origin/mdangelo/codex/hf-fp-t10-pickle-…

0fcb663

…framework-metadata-20260610' into campaign/pr-1644 # Conflicts: # modelaudit/scanners/numpy_scanner.py

fix: preserve ambiguous joblib no-position findings

6e9ed75

Merge remote-tracking branch 'origin/main' into mdangelo/codex/hf-fp-…

ce1ad41

…t10-pickle-framework-metadata-20260610

fix(joblib): clear dtype codec private findings

c6531d7

Merge remote-tracking branch 'origin/mdangelo/codex/hf-fp-t10-pickle-…

d80e0fb

…framework-metadata-20260610' into campaign/pr-1644

mldangelo-oai merged commit bb38b74 into main Jun 13, 2026
32 checks passed

mldangelo-oai deleted the mdangelo/codex/hf-fp-t10-pickle-framework-metadata-20260610 branch June 13, 2026 02:25

Conversation

mldangelo-oai commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Security contract

Real-model QA

Validation

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance Benchmarks

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector Bot commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 11, 2026

Uh oh!

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 12, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 12, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

mldangelo-oai commented Jun 12, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 12, 2026

Uh oh!

mldangelo-oai commented Jun 12, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

mldangelo-oai commented Jun 11, 2026 •

edited

Loading

github-actions Bot commented Jun 11, 2026 •

edited

Loading