Skip to content

fix(picklescan): harden encoded byte probes#1604

Merged
mldangelo-oai merged 1 commit into
mainfrom
mdangelo/codex/harden-encoded-byte-probes
Jun 9, 2026
Merged

fix(picklescan): harden encoded byte probes#1604
mldangelo-oai merged 1 commit into
mainfrom
mdangelo/codex/harden-encoded-byte-probes

Conversation

@mldangelo-oai

@mldangelo-oai mldangelo-oai commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Summary

Follow-up to #1602 that hardens encoded byte-literal scanning without weakening persistent-ID detection.

  • preserve later encoded pickle candidates after an unterminated leading Base64 scalar
  • distinguish confirmed encoded spans from raw persistent-ID polyglots across arbitrary wrapper and filler lengths
  • fail closed for offset truncated executable pickles while keeping README-like near-matches clean
  • scan adjacent Base64 tokens after terminal padding
  • bound strict Base64 raw-probe and terminal-padding scans without imposing bypassable distance caps
  • retain current main behavior from fix: harden pickle CVE stream probing #1603

Validation

  • 218 passed in the changed Python test module
  • 161 passed Rust tests
  • cargo clippy --all-targets --all-features -- -D warnings
  • cargo fmt --all --check
  • Ruff and mypy on the changed Python test
  • 100 KB strict-Base64 junk ending in Q completes cleanly in 0.037 seconds
  • adversarial sweep: 8 malicious filler-gap variants detected; >64-opcode and adjacent-padded-token bypasses detected; 300 benign encoded variants stayed non-malicious
  • no full local suite, per review policy

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Workflow run and artifacts

Performance Benchmarks

Compared 12 shared benchmarks with a regression threshold of 15%.
Status: 0 regressions, 0 improved, 12 stable, 0 new, 0 missing.
Aggregate shared-benchmark median: 1.430s -> 1.447s (+1.2%).

Workload Benchmark Target Size Files Baseline Current Change Status
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_base64] nested_base64 98 B 1 469.5us 491.4us +4.7% stable
chunked-upload-stream tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_chunked_upload_stream chunked_stream 278.2 KiB 1 112.58ms 115.85ms +2.9% stable
clean-training-checkpoint tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_clean_training_checkpoint safe_large 278.2 KiB 1 110.40ms 113.35ms +2.7% stable
warm-cache-rescan tests/benchmarks/test_scan_benchmarks.py::test_scan_warm_cached_repository_rescan release-candidate 547.3 KiB 32 99.52ms 102.06ms +2.6% stable
padded-multi-stream-upload tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_padded_multi_stream_upload multi_stream_padded 4.1 KiB 1 527.5us 540.3us +2.4% stable
direct-malicious-upload tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_direct_malicious_upload malicious_reduce 52 B 1 438.8us 446.2us +1.7% stable
single-checkpoint-preflight tests/benchmarks/test_scan_benchmarks.py::test_scan_single_checkpoint_before_load single_checkpoint.pkl 183.0 KiB 1 72.70ms 73.81ms +1.5% stable
mixed-model-repository tests/benchmarks/test_scan_benchmarks.py::test_scan_release_candidate_repository release-candidate 547.3 KiB 32 481.35ms 488.16ms +1.4% stable
suspicious-pickle-intake tests/benchmarks/test_scan_benchmarks.py::test_scan_suspicious_pickle_intake suspicious-intake 183.8 KiB 4 144.04ms 144.62ms +0.4% stable
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_hex] nested_hex 130 B 1 498.4us 496.9us -0.3% stable
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_raw] nested_raw 78 B 1 474.3us 473.2us -0.2% stable
duplicate-heavy-registry tests/benchmarks/test_scan_benchmarks.py::test_scan_duplicate_registry_snapshot registry-snapshot 915.2 KiB 13 406.98ms 407.12ms +0.0% stable

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6caee5cbf0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/modelaudit-picklescan/rust/src/nested.rs
Comment thread packages/modelaudit-picklescan/rust/src/nested.rs
Comment thread packages/modelaudit-picklescan/rust/src/nested.rs
@mldangelo-oai mldangelo-oai force-pushed the mdangelo/codex/harden-encoded-byte-probes branch 3 times, most recently from 81ba5e4 to 8ad140f Compare June 9, 2026 17:00
@mldangelo-oai mldangelo-oai force-pushed the mdangelo/codex/harden-encoded-byte-probes branch from 8ad140f to 5fc9b1b Compare June 9, 2026 17:01
@mldangelo-oai mldangelo-oai merged commit f8192be into main Jun 9, 2026
22 of 30 checks passed
@mldangelo-oai mldangelo-oai deleted the mdangelo/codex/harden-encoded-byte-probes branch June 9, 2026 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant