Skip to content

fix(text): contextualize tokenizer vocabulary indicators#1653

Open
mldangelo-oai wants to merge 3 commits into
mainfrom
mdangelo/codex/hf-fp-t22-tokenizer-vocabulary-indicators-20260610
Open

fix(text): contextualize tokenizer vocabulary indicators#1653
mldangelo-oai wants to merge 3 commits into
mainfrom
mdangelo/codex/hf-fp-t22-tokenizer-vocabulary-indicators-20260610

Conversation

@mldangelo-oai

Copy link
Copy Markdown
Contributor

Summary

Fixes tokenizer vocabulary false positives where isolated lexical entries such as trojan and zombie in line-oriented vocab.txt files produced command-and-control findings.

Root Cause

The network communication detector reports the first matching cc_pattern occurrence in text. TextScanner previously downgraded bare vocabulary-like tokens to informational severity, but it still emitted S310 findings for benign tokenizer vocabulary entries in Hugging Face vocab.txt files.

Security Tradeoff

This keeps the policy narrow:

  • Requires strong filename evidence plus line-oriented tokenizer vocabulary structure before omitting anything.
  • Omits only isolated tokenizer token entries for trojan and zombie, including common tokenizer prefixes/plurals.
  • Retargets to a later active occurrence of the same pattern instead of dropping the finding.
  • Fails closed after the bounded retarget work limit.
  • Leaves URLs, shell/network commands, API calls, multi-token instructions, templates, serialized payloads, non-vocabulary prose, and other C&C terms such as botnet detected.

Validation

uv run ruff format modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
uv run ruff check --fix modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
PROMPTFOO_DISABLE_TELEMETRY=1 NO_ANALYTICS=1 uv run pytest tests/scanners/test_text_scanner.py --maxfail=1
PROMPTFOO_DISABLE_TELEMETRY=1 NO_ANALYTICS=1 uv run pytest tests/detectors/test_network_comm_detector.py tests/test_core.py -k "network_comm_detector or vocab or tokenizer_config or text" --maxfail=1
PROMPTFOO_DISABLE_TELEMETRY=1 NO_ANALYTICS=1 uv run pytest -n auto -m "not slow and not integration" --maxfail=1
git diff --check

Outcomes:

  • tests/scanners/test_text_scanner.py: 449 passed
  • targeted detector/core lane: 547 passed, 627 deselected
  • broad non-slow/non-integration lane: 17386 passed, 1292 skipped, 39 warnings
  • ruff format, ruff check --fix, mypy, and git diff --check: clean

Pinned Real-Model Streaming QA

Command run from the repository with telemetry disabled and a 2 MiB bound on each streamed vocab.txt:

env PROMPTFOO_DISABLE_TELEMETRY=1 NO_ANALYTICS=1 HF_HUB_DISABLE_TELEMETRY=1 uv run python - <<'PY'
from __future__ import annotations
import json, shutil, tempfile
from pathlib import Path
from typing import Iterator
from huggingface_hub import HfApi, hf_hub_download
from modelaudit.core import scan_model_streaming
MODELS = [("BAAI/bge-base-en-v1.5", "a5beb1e3e68b9ab74eb54cfd186867f64f240e1a"), ("ProsusAI/finbert", "4556d13015211d73dccd3fdd39d39232506f3e43")]
MAX_BYTES = 2 * 1024 * 1024
api = HfApi(); root = Path(tempfile.mkdtemp(prefix="modelaudit-task22-stream-qa."))
def one_file(path: Path) -> Iterator[tuple[Path, bool]]:
    yield path, True
summaries = []
for repo_id, revision in MODELS:
    info = api.model_info(repo_id, revision=revision, files_metadata=True)
    vocab = next(s for s in info.siblings if s.rfilename == "vocab.txt")
    size = int(vocab.size or 0)
    if size <= 0 or size > MAX_BYTES:
        raise SystemExit(f"Refusing {repo_id}@{revision} vocab.txt size={size}")
    model_dir = root / repo_id.replace("/", "__") / revision; model_dir.mkdir(parents=True)
    cached = Path(hf_hub_download(repo_id=repo_id, revision=revision, filename="vocab.txt", cache_dir=root / "hf-cache", token=False))
    staged = model_dir / "vocab.txt"; shutil.copy2(cached, staged)
    result = scan_model_streaming(one_file(staged), timeout=120, delete_after_scan=False, scan_root=model_dir, check_secrets=False, max_file_size=MAX_BYTES, max_total_size=MAX_BYTES)
    cc = [c for c in result.checks or [] if c.name == "Network Communication Detection" and (c.details or {}).get("type") == "cc_pattern"]
    network = [c for c in result.checks or [] if c.name == "Network Communication Detection"]
    summaries.append({"repo_id": repo_id, "revision": revision, "vocab_bytes": size, "vocab_lines": sum(1 for _ in staged.open("rb")), "success": result.success, "files_scanned": result.files_scanned, "cc_pattern_checks": len(cc), "network_statuses": [c.status.value for c in network]})
print(json.dumps({"summaries": summaries}, indent=2, sort_keys=True))
PY

Outcomes:

  • BAAI/bge-base-en-v1.5 @ a5beb1e3e68b9ab74eb54cfd186867f64f240e1a: vocab.txt 231508 bytes / 30522 lines, 1 streamed file scanned, success=true, network check passed, cc_pattern_checks=0
  • ProsusAI/finbert @ 4556d13015211d73dccd3fdd39d39232506f3e43: vocab.txt 231508 bytes / 30522 lines, 1 streamed file scanned, success=true, network check passed, cc_pattern_checks=0

Malicious Controls

Added regression coverage that keeps these actionable:

  • Later active same-pattern instructions after isolated trojan / zombie vocabulary entries.
  • URL assignments, curl, nc, requests.get, and multi-token C&C instructions inside vocab.txt.
  • Non-vocabulary prose mentioning a trojan payload.
  • Ambiguous short vocab.txt files.
  • Over-budget isolated entries, which fail closed with inconclusive text-content coverage.

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex review

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Workflow run and artifacts

Performance Benchmarks

Compared 12 shared benchmarks with a regression threshold of 15%.
Status: 0 regressions, 0 improved, 12 stable, 0 new, 0 missing.
Aggregate shared-benchmark median: 1.466s -> 1.475s (+0.6%).

Workload Benchmark Target Size Files Baseline Current Change Status
chunked-upload-stream tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_chunked_upload_stream chunked_stream 278.2 KiB 1 116.02ms 123.34ms +6.3% stable
clean-training-checkpoint tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_clean_training_checkpoint safe_large 278.2 KiB 1 113.49ms 120.16ms +5.9% stable
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_raw] nested_raw 78 B 1 520.3us 502.2us -3.5% stable
padded-multi-stream-upload tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_padded_multi_stream_upload multi_stream_padded 4.1 KiB 1 582.7us 563.3us -3.3% stable
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_hex] nested_hex 130 B 1 563.1us 548.3us -2.6% stable
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_base64] nested_base64 98 B 1 518.4us 507.2us -2.1% stable
duplicate-heavy-registry tests/benchmarks/test_scan_benchmarks.py::test_scan_duplicate_registry_snapshot registry-snapshot 915.2 KiB 13 418.18ms 414.66ms -0.8% stable
direct-malicious-upload tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_direct_malicious_upload malicious_reduce 52 B 1 452.3us 449.0us -0.7% stable
suspicious-pickle-intake tests/benchmarks/test_scan_benchmarks.py::test_scan_suspicious_pickle_intake suspicious-intake 183.8 KiB 4 147.85ms 147.22ms -0.4% stable
mixed-model-repository tests/benchmarks/test_scan_benchmarks.py::test_scan_release_candidate_repository release-candidate 547.3 KiB 32 486.88ms 485.43ms -0.3% stable
single-checkpoint-preflight tests/benchmarks/test_scan_benchmarks.py::test_scan_single_checkpoint_before_load single_checkpoint.pkl 183.0 KiB 1 75.88ms 76.07ms +0.2% stable
warm-cache-rescan tests/benchmarks/test_scan_benchmarks.py::test_scan_warm_cached_repository_rescan release-candidate 547.3 KiB 32 105.30ms 105.43ms +0.1% stable

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Breezy!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@mldangelo-oai mldangelo-oai requested a review from mldangelo June 11, 2026 02:00
@mldangelo-oai mldangelo-oai enabled auto-merge (squash) June 11, 2026 02:01
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

New exact pinned QA before merge: google-bert/bert-large-uncased@6da4b6a26a1877e173fca3225479512db81a5e5b reports isolated trojan and zombie vocabulary entries as informational S310 indicators on origin/main. Please validate the current head removes these vocabulary-only findings while preserving executable/contextual command-and-control controls, then post the exact outcome.

@mldangelo-oai mldangelo-oai disabled auto-merge June 11, 2026 03:05

Copy link
Copy Markdown
Contributor Author

@codex review

Exact-head QA update for 2b2e5b9d9b2cd12c398d48f0802de912f86f4ee2:

  • Pinned vocab source: google-bert/bert-large-uncased@6da4b6a26a1877e173fca3225479512db81a5e5b / vocab.txt; downloaded file has 30522 lines, SHA256 07eced375cec144d27c900241f3e339478dec958f92fddbc551f295c992038a3, with isolated entries zombie at line 11799 and trojan at line 23446.
  • origin/main (8d6c4864fe2ea833ceaef1b9803d225afb1e8d69) CLI scan of the exact pinned HF vocab.txt URL reproduced 2 informational S310 issues: patterns trojan, zombie.
  • PR head 2b2e5b9d9b2cd12c398d48f0802de912f86f4ee2 CLI scan of the same exact pinned HF vocab.txt URL is clean: issue_count=0, s310_count=0, and Network Communication Detection passed.
  • Added focused serialized-context regression: tests/scanners/test_pickle_scanner.py::test_expensive_raw_prefilters_preserve_serialized_cc_terms, covering trojan and zombie in pickled security-relevant C&C payloads.

Focused/local validation:

  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_text_scanner.py -k "tokenizer_vocab or active_vocabulary_context or bare_active_vocabulary_tokens" -q -> 23 passed, 426 deselected
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_pickle_scanner.py::test_expensive_raw_prefilters_preserve_serialized_cc_terms -q -> 2 passed
  • uv run ruff check --fix modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ -> clean
  • uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ -> 419 files already formatted
  • uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ -> Success: no issues found in 474 source files
  • git diff --check -> clean

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 🚀

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Pinned vocabulary QA input: sentence-transformers/all-MiniLM-L12-v2@a50ef00143b4d5391434df20ae11632588ac25be emits S310 for the standalone vocabulary tokens trojan and zombie. These are inert entries in vocab.txt; include this exact model in the current-head QA while preserving executable/network-context detections.

@mldangelo-oai mldangelo-oai enabled auto-merge (squash) June 11, 2026 03:59

Copy link
Copy Markdown
Contributor Author

Second pinned vocabulary QA: mixedbread-ai/mxbai-embed-large-v1@b33106f585b9ce46904ad7443a3b52b7a63e231c emits informational S310 trojan and zombie from normal vocab.txt on main 8d6c4864. The existing real-model regression should generalize, but @codex confirm this exact vocab is clean on the current head while serialized/executable-context controls still produce critical findings.

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Nice work!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@mldangelo-oai mldangelo-oai disabled auto-merge June 11, 2026 05:05
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Additional pinned baseline reproduction for this PR's exact false-positive family:

  • Model: google-bert/bert-base-multilingual-uncased
  • HF revision: 7cbf9a625e29989f6b9c6c2fa68234c304f7e38f
  • Selection rank: 242
  • Baseline ModelAudit: 8d6c4864fe2ea833ceaef1b9803d225afb1e8d69
  • Artifact: vocab.txt
  • Finding: informational S310 C&C pattern detected: zombie
  • Evidence: the matched snippet is exactly the isolated vocabulary token zombie at byte position 394768.

This is directly in scope. Auto-merge is paused until the exact pinned artifact is rerun at the current PR head and proves zero C2 finding while the existing active-instruction controls remain actionable.

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Additional exact-main vocabulary QA from Hugging Face rank 254:

  • Model: google-bert/bert-base-cased@cd5ef92a9fb2f889e972770a36d4ed042daf221e
  • File: root vocab.txt (213,450 bytes)
  • Main 8d6c4864 emits S310 for both ordinary Trojan and zombie vocabulary entries.

This independently reproduces the rank-242 multilingual vocabulary family. Please include both exact models in current-head QA while preserving detection of these terms in executable, URL, configuration, prose-instruction, and serialized contexts. Full audit: /Users/mdangelo/modelaudit-hf-scan-swarm/outputs/hf-audit-r253-r254-20260611.md.

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex review

Pinned multilingual vocab follow-up for head 63f0220774c2254bf3ec6fb31e95d3885cdd624a:

  • Source: google-bert/bert-base-multilingual-uncased@7cbf9a625e29989f6b9c6c2fa68234c304f7e38f / vocab.txt
  • Downloaded artifact: 871891 bytes, 105879 lines, SHA256 87b44292b452f6c05afa49b2e488e7eedf79ea4f4c39db6f2f4b37764228ef3f; isolated zombie at byte 394768.
  • Direct selected-file CLI QA: --scanners text --max-size 20MB scanned 1 file / 871891 bytes with issue_count=0, s310_count=0, and Network Communication Detection passed.
  • Bounded streamed HF text QA: scanned 2 text files / 880821 bytes; vocab.txt had vocab_issues=[], c2_count=0, and Network Communication Detection passed. The repo-level streamed command still exits 1 from README URL findings only, not from vocab.txt.
  • Added regressions for BERT-like multilingual vocabulary content plus mocked Hugging Face streaming; active prose/config/URL/script/metadata-shaped contexts remain actionable.

Validation run:

  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_text_scanner.py -k "multilingual_tokenizer_vocab" -q
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/test_cli.py -k "streaming_omits_multilingual_vocab_cc_token" -q
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_text_scanner.py -k "tokenizer_vocab or vocabulary or multilingual_tokenizer_vocab or bare_active_vocabulary_tokens or active_vocabulary_context" -q
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_text_scanner.py -q
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/test_cli.py -k "huggingface_streaming" -q
  • uv run ruff format modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
  • uv run ruff check --fix modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
  • uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -n auto -m "not slow and not integration" --maxfail=1 -> 17396 passed, 1292 skipped
  • uv run ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
  • uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
  • git diff --check

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Bravo.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Supplemental pinned vocabulary QA for the later rank-254 note on head 63f0220774c2254bf3ec6fb31e95d3885cdd624a:

  • Source: google-bert/bert-base-cased@cd5ef92a9fb2f889e972770a36d4ed042daf221e / root vocab.txt
  • Downloaded artifact: 213450 bytes, 28996 lines, SHA256 eeaa9875b23b04b4c54ef759d03db9d1ba1554838f8fb26c5d96fa551df93d02; vocabulary entries include zombie, zombies, and Trojan.
  • Direct selected-file CLI QA: --scanners text --max-size 20MB scanned 1 file / 213450 bytes with issue_count=0, s310_count=0, and Network Communication Detection passed.

This is consistent with the multilingual rank-242 QA above; vocabulary-only entries are clean, while active prose/config/URL/script/metadata-shaped contexts remain covered by the added regressions.

@mldangelo-oai mldangelo-oai enabled auto-merge (squash) June 11, 2026 06:37

@ianw-oai ianw-oai left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Holding my approval for now: the tokenizer-vocabulary suppression is still pattern-specific (trojan/zombie only), which looks too tailored to the pinned QA cases for this simple-approval pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants