Skip to content

fix(pickle): require source proof for framework metadata#1644

Merged
mldangelo-oai merged 124 commits into
mainfrom
mdangelo/codex/hf-fp-t10-pickle-framework-metadata-20260610
Jun 13, 2026
Merged

fix(pickle): require source proof for framework metadata#1644
mldangelo-oai merged 124 commits into
mainfrom
mdangelo/codex/hf-fp-t10-pickle-framework-metadata-20260610

Conversation

@mldangelo-oai

@mldangelo-oai mldangelo-oai commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes false-positive task 10 for installed, source-trusted framework metadata only. Inert Hugging Face / Accelerate / Torch training metadata can be suppressed when the referenced implementation resolves from trusted installed source or an expected extension owner and the executable surface has been inspected. Default and standalone installs without optional frameworks intentionally retain NON_ALLOWLISTED_GLOBAL warnings; unresolved-name trust is not restored.

Security contract

  • Do not import missing optional frameworks during scan.
  • Do not trust unresolved framework names by module/name alone.
  • Loaded site-package references are trusted only when the live executable identity is tied to trusted source or a known extension export owner. If a trusted framework module was imported and rebound before scanner initialization, scans remain suspicious.
  • Shadowed, rebound, source-unavailable, nested, concatenated, memo-aliased, NEWOBJ_EX, slot-state BUILD, and EXT1/EXT2/EXT4 controls remain detected or suspicious.

Real-model QA

Pinned nvidia/LocateAnything-3B@272068e81a31e88a48ea03c20a09decba2b62ed6/training_args.bin remains the target benign source-trusted metadata case. The default/missing-framework profile is now explicitly out of scope for clean suppression and should warn rather than fail open.

Validation

  • Python 3.10 focused standalone/root import-order and unresolved-framework controls passed.
  • Current Python focused standalone/root framework metadata, scan/load divergence, safe NumPy, memo/concat/extension controls passed.
  • uv run ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ passed.
  • uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ passed.
  • uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ --cache-dir /tmp/modelaudit-mypy-cache-pr1644-rebind passed.
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest packages/modelaudit-picklescan/tests -q passed: 2382 passed, 9 skipped.
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_pytorch_zip_scanner.py -q passed: 852 passed, 4 skipped.
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -n auto -m "not slow and not integration" --maxfail=1 passed: 18313 passed, 837 skipped.

Allow exact Hugging Face and Accelerate training metadata references observed in training_args.bin while keeping executable REDUCE, extension, shadow-constructor, and near-match controls detected.
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

/codex review

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Workflow run and artifacts

Performance Benchmarks

Compared 13 shared benchmarks with a regression threshold of 15%.
Status: 0 regressions, 0 improved, 13 stable, 0 new, 0 missing.
Aggregate shared-benchmark median: 4.108s -> 4.093s (-0.4%).

Workload Benchmark Target Size Files Baseline Current Change Status
clean-training-checkpoint tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_clean_training_checkpoint safe_large 278.2 KiB 1 108.67ms 110.13ms +1.3% stable
warm-cache-rescan tests/benchmarks/test_scan_benchmarks.py::test_scan_warm_cached_repository_rescan release-candidate 547.3 KiB 32 144.47ms 146.13ms +1.1% stable
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_hex] nested_hex 130 B 1 613.9us 620.8us +1.1% stable
rejected-basic-auth-candidates tests/benchmarks/test_scan_benchmarks.py::test_rejected_basic_auth_candidates_scan_linearly - 371.1 KiB 1 2.451s 2.425s -1.1% stable
chunked-upload-stream tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_chunked_upload_stream chunked_stream 278.2 KiB 1 111.88ms 113.07ms +1.1% stable
mixed-model-repository tests/benchmarks/test_scan_benchmarks.py::test_scan_release_candidate_repository release-candidate 547.3 KiB 32 555.24ms 560.63ms +1.0% stable
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_raw] nested_raw 78 B 1 583.6us 578.8us -0.8% stable
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_base64] nested_base64 98 B 1 591.6us 595.4us +0.6% stable
suspicious-pickle-intake tests/benchmarks/test_scan_benchmarks.py::test_scan_suspicious_pickle_intake suspicious-intake 183.8 KiB 4 161.92ms 162.71ms +0.5% stable
direct-malicious-upload tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_direct_malicious_upload malicious_reduce 52 B 1 524.8us 523.9us -0.2% stable
single-checkpoint-preflight tests/benchmarks/test_scan_benchmarks.py::test_scan_single_checkpoint_before_load single_checkpoint.pkl 183.0 KiB 1 87.00ms 87.10ms +0.1% stable
duplicate-heavy-registry tests/benchmarks/test_scan_benchmarks.py::test_scan_duplicate_registry_snapshot registry-snapshot 915.2 KiB 13 485.31ms 485.80ms +0.1% stable
padded-multi-stream-upload tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_padded_multi_stream_upload multi_stream_padded 4.1 KiB 1 646.2us 646.7us +0.1% stable

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex review

Comment thread packages/modelaudit-picklescan/tests/test_api.py Fixed
@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Can't wait for the next one!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Independent review of current head eadf7c7ddd5aac08a5499820fbd94b24b8daf2f3 found two actionable issues:

  1. P1: unresolved framework names can produce a false-clean executable reconstruction result. call_graph.py treats new metadata references as trusted even when the framework is unavailable in the scanner environment, and api.py then suppresses findings for exact NEWOBJ/NEWOBJ_EX/BUILD and selected REDUCE invocations. The regression test currently forces resolution to "unresolved" and requires COMPLETE/CLEAN, which locks in the unsafe behavior. A same-name transformers.training_args.TrainingArguments class supplied only in the later unpickle environment can execute attacker-controlled __setstate__ while the earlier scan remains clean. Do not suppress invoked unresolved references to clean. Require source-backed trusted origin plus safe call-graph analysis, or retain a suspicious verdict when the implementation cannot be inspected.

    Required regressions: scan while the framework is unresolved, then unpickle in an isolated child containing the exact shadow class and require a non-clean scan; cover standalone scan_file() and root PyTorchZipScanner; protocols 4/5 STACK_GLOBAL, memo aliases, NEWOBJ_EX, slot-state BUILD, nested and concatenated streams; and EXT2/EXT4 controls in addition to EXT1.

  2. P3: missing root changelog. The PR changes root PyTorchZipScanner behavior for training_args.bin, but only updates packages/modelaudit-picklescan/CHANGELOG.md. Add the corresponding user-visible entry to root CHANGELOG.md.

Please address both findings, push a new head, resolve this feedback with concrete test evidence, and request a new exact-head Codex review.

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Addressed the independent review findings at exact head 0f9fa3e82d93193176db8a9d519e925bace168df.

P1: unresolved framework metadata now fails closed instead of suppressing executable reconstruction findings. call_graph.py no longer treats unresolved framework reconstruction/metadata references as trusted import-only references, and api.py now requires invoked trusted references to have safe source-backed analysis before suppressing NON_ALLOWLISTED_GLOBAL findings. Benign compatibility is retained only for inspected/validated paths, including the existing joblib raw ndarray handling after bounded parser validation.

Regressions added for scan/load environment divergence through standalone scan_file() and root PyTorchZipScanner, covering protocol 4 and 5 STACK_GLOBAL, memo aliases, NEWOBJ_EX, slot-state BUILD, nested streams, concatenated streams, and EXT1/EXT2/EXT4 controls. Each divergence case scans with the framework unresolved, asserts a non-clean suspicious result, then unpickles in an isolated child with a same-name shadow class and verifies the shadow __setstate__ execution marker.

P3: added the root CHANGELOG.md entry for the user-visible root PyTorch ZIP scanner behavior change, alongside the picklescan package changelog update.

Local validation:

  • uv run ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ passed.
  • uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ passed.
  • uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ passed.
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest packages/modelaudit-picklescan/tests/test_api.py -k "unresolved_framework or framework_metadata or scan_load_divergence or selected_hf_framework_metadata_reduce" --maxfail=1 passed: 34 passed.
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_pytorch_zip_scanner.py -k "training_args_unresolved or scan_load_divergence" --maxfail=1 passed: 11 passed.
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest packages/modelaudit-picklescan/tests/test_api.py --maxfail=1 passed: 667 passed.
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_pytorch_zip_scanner.py --maxfail=1 passed: 845 passed, 1 skipped.

I also attempted the full non-slow/non-integration xdist lane. It stopped at tests/cache/test_cache_correctness.py::test_cached_scan_does_not_serialize_known_uncacheable_scan_result with release_calls == 2; that same node passed both isolated and isolated-with-xdist reruns, so I am treating it as an unrelated order-dependent local failure while waiting on remote CI.

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex review

Please review exact head 0f9fa3e82d93193176db8a9d519e925bace168df.

Comment thread packages/modelaudit-picklescan/tests/test_api.py Fixed

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0f9fa3e82d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/scanners/joblib_scanner.py Outdated
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Update for exact head ea25b1424e1d99e8df1d4bd053db62a832499936:

I fixed the CI failure from the first push as well. The stricter suppression initially made reviewed interpreter-owned stdlib reconstruction references such as _frozen_importlib.ModuleSpec, string.Formatter, and weakref.ref/proxy suspicious when their source is not analyzable. The follow-up commit keeps the no-analysis shortcut only for those reviewed non-framework references. Framework reconstruction/metadata and framework REDUCE references still require source-backed analysis; unresolved framework origins remain suspicious.

Additional validation after the fixup:

  • uv run --python 3.13 --with pytest --with pytest-xdist pytest tests/test_call_graph_import_statements.py::test_unresolved_framework_reconstruction_reference_requires_origin_review tests/test_call_graph_import_statements.py::test_scan_bytes_keeps_frozen_stdlib_globals_clean_without_custom_meta_path_finders tests/test_call_graph_import_statements.py::test_scan_bytes_blocks_formatter_vformat_defaultdict_factory_rce tests/test_call_graph_import_statements.py::test_scan_bytes_keeps_formatter_vformat_defaultdict_without_live_fields_clean tests/test_call_graph_import_statements.py::test_scan_bytes_blocks_formatter_private_vformat_defaultdict_factory_rce tests/test_call_graph_import_statements.py::test_scan_bytes_keeps_formatter_private_vformat_defaultdict_without_live_fields_clean tests/test_call_graph_import_statements.py::test_scan_bytes_keeps_weakref_without_callback_clean --tb=short -vv passed: 8 passed.
  • uv run --python 3.13 --with pytest --with pytest-xdist pytest -n auto tests --tb=short from packages/modelaudit-picklescan passed: 2298 passed, 83 skipped.
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest packages/modelaudit-picklescan/tests/test_api.py -k "unresolved_framework or framework_metadata or scan_load_divergence or selected_hf_framework_metadata_reduce" --maxfail=1 passed: 34 passed.
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_pytorch_zip_scanner.py -k "training_args_unresolved or scan_load_divergence" --maxfail=1 passed: 11 passed.
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_pytorch_zip_scanner.py --maxfail=1 passed: 845 passed, 1 skipped.
  • uv run ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ passed.
  • uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ passed.
  • uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ passed.

The earlier full root non-slow/non-integration local xdist attempt still stopped on tests/cache/test_cache_correctness.py::test_cached_scan_does_not_serialize_known_uncacheable_scan_result with release_calls == 2; that node passed isolated and isolated-with-xdist reruns. Remote CI is now rerunning on the new head.

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex review

Please review exact head ea25b1424e1d99e8df1d4bd053db62a832499936.

Comment thread packages/modelaudit-picklescan/src/modelaudit_picklescan/api.py Fixed

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ea25b1424e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/scanners/joblib_scanner.py Outdated
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Additional exact pinned QA: google-bert/bert-large-uncased@6da4b6a26a1877e173fca3225479512db81a5e5b contains canonical-looking rust_model.ot TorchScript structure: __torch__.Module, GLOBAL/NEWOBJ/BUILD, torch.jit._pickle.build_intlist, and REDUCE, with valid ZIP paths/ratios and no malicious import identified. Include this as a source-backed benign/suspicious reconstruction control. Parent/member hash attribution is being fixed separately in task 39.

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Update for exact head 6bfce7abf1ba7d653fe3afc90b4d0e022e895a99:

Addressed the additional feedback posted after the previous push:

  • Added a pinned google-bert/bert-large-uncased@6da4b6a26a1877e173fca3225479512db81a5e5b synthetic rust_model.ot TorchScript QA control. It includes valid TorchScript ZIP layout, generated __torch__.py plus debug pair, and archive/data.pkl with __torch__.Module GLOBAL/NEWOBJ/BUILD plus torch.jit._pickle.build_intlist REDUCE. The test asserts warning/suspicious reconstruction findings and no critical/malicious finding for the source-backed generated Python path.
  • Addressed Codex P2 by keeping joblib.numpy_pickle.NumpyArrayWrapper origin-review findings unless import_only_reference_is_proven_trusted("joblib.numpy_pickle", "NumpyArrayWrapper") succeeds. Added an untrusted-origin regression for a valid raw ndarray span.
  • Applied the same trusted-origin gate in the root pickle legitimate-serialization helper.
  • Removed the unused constants reported by code-quality.

Additional local validation for this head:

  • uv run ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ passed.
  • uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ passed.
  • uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ passed.
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_joblib_scanner_codecs.py tests/scanners/test_pickle_scanner.py::test_legitimate_serialization_file_uses_rust_scan tests/scanners/test_pickle_scanner.py::test_legitimate_serialization_file_keeps_untrusted_wrapper_origin_review tests/scanners/test_pickle_scanner.py::test_legitimate_serialization_file_skips_call_graph_enrichment tests/scanners/test_pickle_scanner.py::test_legitimate_serialization_file_keeps_bounded_file_reads --maxfail=1 passed: 70 passed.
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest packages/modelaudit-picklescan/tests/test_api.py -k "unresolved_framework or framework_metadata or scan_load_divergence or selected_hf_framework_metadata_reduce" --maxfail=1 passed: 34 passed.
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_pytorch_zip_scanner.py --maxfail=1 passed: 846 passed, 1 skipped.
  • uv run --python 3.13 --with pytest --with pytest-xdist pytest -n auto tests --tb=short from packages/modelaudit-picklescan passed: 2298 passed, 83 skipped.

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex review

Please review exact head 6bfce7abf1ba7d653fe3afc90b4d0e022e895a99.

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. You're on a roll.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/modelaudit-picklescan/src/modelaudit_picklescan/api.py Fixed
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Pinned TorchScript QA input: sentence-transformers/all-MiniLM-L12-v2@a50ef00143b4d5391434df20ae11632588ac25be, file rust_model.ot. Current main flags standard-looking GLOBAL/NEWOBJ/BUILD plus __torch__.Module even though nested execution evidence is false and the archive contains generated __torch__.py. Please classify every contained pickle member independently, preserve the worst substream outcome in archive aggregation, and prove genuine external/import-time execution remains critical.

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Update for exact head 24a4f334ef42cdf2db7e1f879ee2dab51a16c7d7:

Fixed the root Python 3.13 CI failure in test_scan_keeps_untrusted_numpy_wrapper_origin_review_after_valid_raw_array. The joblib test helper now patches both root and standalone picklescan origin-review functions so trusted/untrusted joblib.numpy_pickle.NumpyArrayWrapper behavior is deterministic even when CI has joblib installed. The scanner also clears the wrapper-specific call_graph_source_unavailable info notice only when the wrapper origin is trusted and the raw ndarray span was statically validated; untrusted wrapper origin-review findings remain intact.

Validation after this fixup:

  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_joblib_scanner_codecs.py --maxfail=1 passed: 66 passed.
  • uv run ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ passed.
  • uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ passed.
  • uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ passed.

CI is rerunning on this new head.

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex review

Please review exact head 24a4f334ef42cdf2db7e1f879ee2dab51a16c7d7.

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Another round soon, please!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown
Contributor Author

Pinned real-model framework-metadata QA: SupraLabs/Supra-50M-Reasoning (rank 208; exact revision is in the coordinator scan metadata) emits S201 plus nine warning NON_ALLOWLISTED_GLOBAL findings from standard training_args.bin globals: TrainingArguments, SchedulerType, OptimizerNames, IntervalStrategy, SaveStrategy, HubStrategy, AcceleratorConfig, PartialState, and DistributedType. @codex test the exact training_args.bin on the live head and suppress only inert import-only framework metadata; preserve findings when those globals participate in REDUCE/NEWOBJ/BUILD execution, carry attacker arguments/state, or lead to non-framework callables.

…framework-metadata-20260610' into campaign/pr-1644
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex review Please review exact head 8523f02.

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Keep it up!

Reviewed commit: 8523f02192

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…framework-metadata-20260610' into mdangelo/codex/hf-fp-t10-pickle-framework-metadata-20260610

# Conflicts:
#	tests/scanners/test_pickle_scanner.py

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b748f27177

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/scanners/numpy_scanner.py
…framework-metadata-20260610' into campaign/pr-1644
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex review Please review exact head f72ac2d.

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Can't wait for the next one!

Reviewed commit: f72ac2d353

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…framework-metadata-20260610' into mdangelo/codex/hf-fp-t10-pickle-framework-metadata-20260610

# Conflicts:
#	modelaudit/scanners/joblib_scanner.py
#	packages/modelaudit-picklescan/src/modelaudit_picklescan/api.py
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Addressed the latest live review-thread items at exact head 8a1a322cad2b9d708ce647cc5b88d6813d8d30b0.

What changed in this round:

  • PyTorch storage PID trust is bound to the actual matched GLOBAL / STACK_GLOBAL position used inside the validated BINPERSID tuple, including nested/member position_offset; unmatched same-name storage globals have stale pytorch_storage_persistent_id metadata cleared.
  • Joblib validated raw-array cleanup removes matching ACTIONABLE_FAILED_CHECKS private evidence along with public checks/issues, and validated no-position source-unavailable notices no longer keep cleaned benign results out of cache.
  • NumPy object-dtype reconstruction cleanup now removes matching private failed-check evidence when pruning validated reconstruction findings, so cleaned benign .npy scans are cacheable.
  • Merged the published PR branch additively and confirmed current origin/main was already included before final validation.

Thread handling:

  • Resolved the fixed NumPy private-evidence thread PRRT_kwDOOEde4c6JQN8A.
  • The storage PID and Joblib private-evidence threads were already resolved by the published branch update and remain fixed at this head.
  • Left PRRT_kwDOOEde4c6IsS7X unresolved intentionally because this PR keeps default/standalone missing-framework installs suspicious rather than restoring unresolved-name trust.

Final local validation on 8a1a322cad2b9d708ce647cc5b88d6813d8d30b0:

  • uv run ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ -> passed.
  • uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ -> 422 files already formatted.
  • uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ --cache-dir /tmp/modelaudit-mypy-cache-pr1644-numpy-private -> success, 477 source files.
  • Current Python focused standalone/root storage/rebinding/memo/extension/nested/Joblib controls -> 121 passed, 1889 deselected.
  • Python 3.10 focused standalone/root storage/rebinding/memo/extension/nested/Joblib controls -> 121 passed, 1889 deselected.
  • Current Python targeted NumPy private-evidence controls -> 2 passed; Python 3.10 same -> 2 passed.
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_numpy_scanner.py -q -> 63 passed.
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest packages/modelaudit-picklescan/tests -q -> 2510 passed, 3 skipped.
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_pytorch_zip_scanner.py --maxfail=1 -q -> 975 passed, 8 skipped.
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -n auto -m "not slow and not integration" --maxfail=1 -> 20780 passed, 785 skipped, 40 warnings.
  • git diff --check -> passed.

@codex review exact head 8a1a322cad2b9d708ce647cc5b88d6813d8d30b0

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8a1a322cad

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread modelaudit/scanners/joblib_scanner.py Outdated
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Addressed the exact-head Joblib no-position cleanup blocker at ce1ad41ab2e6a2b28a293166d44ee0747b29646c.

What changed:

  • NON_ALLOWLISTED_GLOBAL cleanup now treats a no-position issue/check pair with the same signature as one occurrence, but counts distinct no-position signatures as separate occurrences. If only occurrence 1 is validated, occurrence 2 remains suspicious and private actionable evidence remains.
  • call_graph_source_unavailable cleanup is separated from actionable occurrence matching and is pruned only from origin-review proof; ambiguous duplicate no-position source-unavailable notices stay fail-closed.
  • Merged current origin/main additively through 8ba33406 after the fix.

Exact-head validation on ce1ad41ab2e6a2b28a293166d44ee0747b29646c:

  • uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ -> 422 files already formatted.
  • uv run ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ -> passed.
  • uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ --cache-dir /tmp/modelaudit-mypy-cache-pr1644-merge-joblib-no-position -> success, 477 source files.
  • Current Python focused standalone/root pickle/PyTorch/Joblib selection -> 295 passed, 1787 deselected.
  • Python 3.10 same focused selection -> 295 passed, 1787 deselected.
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_joblib_scanner_codecs.py -q -> 73 passed on current Python; 73 passed on Python 3.10.
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest packages/modelaudit-picklescan/tests -q -> 2510 passed, 3 skipped.
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_pytorch_zip_scanner.py --maxfail=1 -q -> 975 passed, 8 skipped.
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -n auto -m "not slow and not integration" --maxfail=1 -> 20781 passed, 785 skipped, 40 warnings.
  • git diff --check -> passed.

Live thread state after push: the Joblib no-position review item is no longer a live unresolved thread; only PRRT_kwDOOEde4c6IsS7X remains unresolved intentionally because this PR keeps default/standalone missing-framework installs suspicious instead of restoring unresolved-name trust.

@codex review exact head ce1ad41ab2e6a2b28a293166d44ee0747b29646c

…framework-metadata-20260610' into campaign/pr-1644
@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Hooray!

Reviewed commit: ce1ad41ab2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex review Please review exact head d80e0fb. Local validation passed: focused private-evidence/storage regressions, full ruff format check, full ruff check, full mypy, and PROMPTFOO_DISABLE_TELEMETRY=1 pytest -n auto -m "not slow and not integration" --maxfail=1 with 20770 passed / 799 skipped. Independent reviewer also found no blockers on this exact head.

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Another round soon, please!

Reviewed commit: d80e0fb6f0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@mldangelo-oai mldangelo-oai merged commit bb38b74 into main Jun 13, 2026
32 checks passed
@mldangelo-oai mldangelo-oai deleted the mdangelo/codex/hf-fp-t10-pickle-framework-metadata-20260610 branch June 13, 2026 02:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant