fix(text): contextualize tokenizer vocabulary indicators by mldangelo-oai · Pull Request #1653 · promptfoo/modelaudit

mldangelo-oai · 2026-06-11T01:42:57Z

Summary

Fixes tokenizer vocabulary false positives where isolated lexical entries such as trojan and zombie in line-oriented vocab.txt files produced command-and-control findings.

Root Cause

The network communication detector reports the first matching cc_pattern occurrence in text. TextScanner previously downgraded bare vocabulary-like tokens to informational severity, but it still emitted S310 findings for benign tokenizer vocabulary entries in Hugging Face vocab.txt files.

Security Tradeoff

This keeps the policy narrow:

Requires strong filename evidence plus line-oriented tokenizer vocabulary structure before omitting anything.
Omits only isolated tokenizer token entries for trojan and zombie, including common tokenizer prefixes/plurals.
Retargets to a later active occurrence of the same pattern instead of dropping the finding.
Fails closed after the bounded retarget work limit.
Leaves URLs, shell/network commands, API calls, multi-token instructions, templates, serialized payloads, non-vocabulary prose, and other C&C terms such as botnet detected.

Validation

uv run ruff format modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
uv run ruff check --fix modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
PROMPTFOO_DISABLE_TELEMETRY=1 NO_ANALYTICS=1 uv run pytest tests/scanners/test_text_scanner.py --maxfail=1
PROMPTFOO_DISABLE_TELEMETRY=1 NO_ANALYTICS=1 uv run pytest tests/detectors/test_network_comm_detector.py tests/test_core.py -k "network_comm_detector or vocab or tokenizer_config or text" --maxfail=1
PROMPTFOO_DISABLE_TELEMETRY=1 NO_ANALYTICS=1 uv run pytest -n auto -m "not slow and not integration" --maxfail=1
git diff --check

Outcomes:

tests/scanners/test_text_scanner.py: 449 passed
targeted detector/core lane: 547 passed, 627 deselected
broad non-slow/non-integration lane: 17386 passed, 1292 skipped, 39 warnings
ruff format, ruff check --fix, mypy, and git diff --check: clean

Pinned Real-Model Streaming QA

Command run from the repository with telemetry disabled and a 2 MiB bound on each streamed vocab.txt:

env PROMPTFOO_DISABLE_TELEMETRY=1 NO_ANALYTICS=1 HF_HUB_DISABLE_TELEMETRY=1 uv run python - <<'PY'
from __future__ import annotations
import json, shutil, tempfile
from pathlib import Path
from typing import Iterator
from huggingface_hub import HfApi, hf_hub_download
from modelaudit.core import scan_model_streaming
MODELS = [("BAAI/bge-base-en-v1.5", "a5beb1e3e68b9ab74eb54cfd186867f64f240e1a"), ("ProsusAI/finbert", "4556d13015211d73dccd3fdd39d39232506f3e43")]
MAX_BYTES = 2 * 1024 * 1024
api = HfApi(); root = Path(tempfile.mkdtemp(prefix="modelaudit-task22-stream-qa."))
def one_file(path: Path) -> Iterator[tuple[Path, bool]]:
    yield path, True
summaries = []
for repo_id, revision in MODELS:
    info = api.model_info(repo_id, revision=revision, files_metadata=True)
    vocab = next(s for s in info.siblings if s.rfilename == "vocab.txt")
    size = int(vocab.size or 0)
    if size <= 0 or size > MAX_BYTES:
        raise SystemExit(f"Refusing {repo_id}@{revision} vocab.txt size={size}")
    model_dir = root / repo_id.replace("/", "__") / revision; model_dir.mkdir(parents=True)
    cached = Path(hf_hub_download(repo_id=repo_id, revision=revision, filename="vocab.txt", cache_dir=root / "hf-cache", token=False))
    staged = model_dir / "vocab.txt"; shutil.copy2(cached, staged)
    result = scan_model_streaming(one_file(staged), timeout=120, delete_after_scan=False, scan_root=model_dir, check_secrets=False, max_file_size=MAX_BYTES, max_total_size=MAX_BYTES)
    cc = [c for c in result.checks or [] if c.name == "Network Communication Detection" and (c.details or {}).get("type") == "cc_pattern"]
    network = [c for c in result.checks or [] if c.name == "Network Communication Detection"]
    summaries.append({"repo_id": repo_id, "revision": revision, "vocab_bytes": size, "vocab_lines": sum(1 for _ in staged.open("rb")), "success": result.success, "files_scanned": result.files_scanned, "cc_pattern_checks": len(cc), "network_statuses": [c.status.value for c in network]})
print(json.dumps({"summaries": summaries}, indent=2, sort_keys=True))
PY

Outcomes:

BAAI/bge-base-en-v1.5 @ a5beb1e3e68b9ab74eb54cfd186867f64f240e1a: vocab.txt 231508 bytes / 30522 lines, 1 streamed file scanned, success=true, network check passed, cc_pattern_checks=0
ProsusAI/finbert @ 4556d13015211d73dccd3fdd39d39232506f3e43: vocab.txt 231508 bytes / 30522 lines, 1 streamed file scanned, success=true, network check passed, cc_pattern_checks=0

Malicious Controls

Added regression coverage that keeps these actionable:

Later active same-pattern instructions after isolated trojan / zombie vocabulary entries.
URL assignments, curl, nc, requests.get, and multi-token C&C instructions inside vocab.txt.
Non-vocabulary prose mentioning a trojan payload.
Ambiguous short vocab.txt files.
Over-budget isolated entries, which fail closed with inconclusive text-content coverage.

mldangelo-oai · 2026-06-11T01:43:34Z

@codex review

github-actions · 2026-06-11T01:45:12Z

Workflow run and artifacts

Performance Benchmarks

Compared 12 shared benchmarks with a regression threshold of 15%.
Status: 0 regressions, 0 improved, 12 stable, 0 new, 0 missing.
Aggregate shared-benchmark median: 1.466s -> 1.475s (+0.6%).

Workload	Benchmark	Target	Size	Files	Baseline	Current	Change	Status
`chunked-upload-stream`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_chunked_upload_stream`	`chunked_stream`	278.2 KiB	1	116.02ms	123.34ms	+6.3%	stable
`clean-training-checkpoint`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_clean_training_checkpoint`	`safe_large`	278.2 KiB	1	113.49ms	120.16ms	+5.9%	stable
`nested-payload-review`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_raw]`	`nested_raw`	78 B	1	520.3us	502.2us	-3.5%	stable
`padded-multi-stream-upload`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_padded_multi_stream_upload`	`multi_stream_padded`	4.1 KiB	1	582.7us	563.3us	-3.3%	stable
`nested-payload-review`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_hex]`	`nested_hex`	130 B	1	563.1us	548.3us	-2.6%	stable
`nested-payload-review`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_base64]`	`nested_base64`	98 B	1	518.4us	507.2us	-2.1%	stable
`duplicate-heavy-registry`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_duplicate_registry_snapshot`	`registry-snapshot`	915.2 KiB	13	418.18ms	414.66ms	-0.8%	stable
`direct-malicious-upload`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_direct_malicious_upload`	`malicious_reduce`	52 B	1	452.3us	449.0us	-0.7%	stable
`suspicious-pickle-intake`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_suspicious_pickle_intake`	`suspicious-intake`	183.8 KiB	4	147.85ms	147.22ms	-0.4%	stable
`mixed-model-repository`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_release_candidate_repository`	`release-candidate`	547.3 KiB	32	486.88ms	485.43ms	-0.3%	stable
`single-checkpoint-preflight`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_single_checkpoint_before_load`	`single_checkpoint.pkl`	183.0 KiB	1	75.88ms	76.07ms	+0.2%	stable
`warm-cache-rescan`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_warm_cached_repository_rescan`	`release-candidate`	547.3 KiB	32	105.30ms	105.43ms	+0.1%	stable

chatgpt-codex-connector · 2026-06-11T01:45:56Z

Codex Review: Didn't find any major issues. Breezy!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T03:05:38Z

New exact pinned QA before merge: google-bert/bert-large-uncased@6da4b6a26a1877e173fca3225479512db81a5e5b reports isolated trojan and zombie vocabulary entries as informational S310 indicators on origin/main. Please validate the current head removes these vocabulary-only findings while preserving executable/contextual command-and-control controls, then post the exact outcome.

mldangelo-oai · 2026-06-11T03:17:08Z

@codex review

Exact-head QA update for 2b2e5b9d9b2cd12c398d48f0802de912f86f4ee2:

Pinned vocab source: google-bert/bert-large-uncased@6da4b6a26a1877e173fca3225479512db81a5e5b / vocab.txt; downloaded file has 30522 lines, SHA256 07eced375cec144d27c900241f3e339478dec958f92fddbc551f295c992038a3, with isolated entries zombie at line 11799 and trojan at line 23446.
origin/main (8d6c4864fe2ea833ceaef1b9803d225afb1e8d69) CLI scan of the exact pinned HF vocab.txt URL reproduced 2 informational S310 issues: patterns trojan, zombie.
PR head 2b2e5b9d9b2cd12c398d48f0802de912f86f4ee2 CLI scan of the same exact pinned HF vocab.txt URL is clean: issue_count=0, s310_count=0, and Network Communication Detection passed.
Added focused serialized-context regression: tests/scanners/test_pickle_scanner.py::test_expensive_raw_prefilters_preserve_serialized_cc_terms, covering trojan and zombie in pickled security-relevant C&C payloads.

Focused/local validation:

PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_text_scanner.py -k "tokenizer_vocab or active_vocabulary_context or bare_active_vocabulary_tokens" -q -> 23 passed, 426 deselected
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_pickle_scanner.py::test_expensive_raw_prefilters_preserve_serialized_cc_terms -q -> 2 passed
uv run ruff check --fix modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ -> clean
uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ -> 419 files already formatted
uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ -> Success: no issues found in 474 source files
git diff --check -> clean

chatgpt-codex-connector · 2026-06-11T03:19:49Z

Codex Review: Didn't find any major issues. 🚀

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T03:37:44Z

Pinned vocabulary QA input: sentence-transformers/all-MiniLM-L12-v2@a50ef00143b4d5391434df20ae11632588ac25be emits S310 for the standalone vocabulary tokens trojan and zombie. These are inert entries in vocab.txt; include this exact model in the current-head QA while preserving executable/network-context detections.

mldangelo-oai · 2026-06-11T04:07:39Z

Second pinned vocabulary QA: mixedbread-ai/mxbai-embed-large-v1@b33106f585b9ce46904ad7443a3b52b7a63e231c emits informational S310 trojan and zombie from normal vocab.txt on main 8d6c4864. The existing real-model regression should generalize, but @codex confirm this exact vocab is clean on the current head while serialized/executable-context controls still produce critical findings.

chatgpt-codex-connector · 2026-06-11T04:10:39Z

Codex Review: Didn't find any major issues. Nice work!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T05:05:58Z

Additional pinned baseline reproduction for this PR's exact false-positive family:

Model: google-bert/bert-base-multilingual-uncased
HF revision: 7cbf9a625e29989f6b9c6c2fa68234c304f7e38f
Selection rank: 242
Baseline ModelAudit: 8d6c4864fe2ea833ceaef1b9803d225afb1e8d69
Artifact: vocab.txt
Finding: informational S310 C&C pattern detected: zombie
Evidence: the matched snippet is exactly the isolated vocabulary token zombie at byte position 394768.

This is directly in scope. Auto-merge is paused until the exact pinned artifact is rerun at the current PR head and proves zero C2 finding while the existing active-instruction controls remain actionable.

mldangelo-oai · 2026-06-11T05:31:51Z

Additional exact-main vocabulary QA from Hugging Face rank 254:

Model: google-bert/bert-base-cased@cd5ef92a9fb2f889e972770a36d4ed042daf221e
File: root vocab.txt (213,450 bytes)
Main 8d6c4864 emits S310 for both ordinary Trojan and zombie vocabulary entries.

This independently reproduces the rank-242 multilingual vocabulary family. Please include both exact models in current-head QA while preserving detection of these terms in executable, URL, configuration, prose-instruction, and serialized contexts. Full audit: /Users/mdangelo/modelaudit-hf-scan-swarm/outputs/hf-audit-r253-r254-20260611.md.

mldangelo-oai · 2026-06-11T06:06:35Z

@codex review

Pinned multilingual vocab follow-up for head 63f0220774c2254bf3ec6fb31e95d3885cdd624a:

Source: google-bert/bert-base-multilingual-uncased@7cbf9a625e29989f6b9c6c2fa68234c304f7e38f / vocab.txt
Downloaded artifact: 871891 bytes, 105879 lines, SHA256 87b44292b452f6c05afa49b2e488e7eedf79ea4f4c39db6f2f4b37764228ef3f; isolated zombie at byte 394768.
Direct selected-file CLI QA: --scanners text --max-size 20MB scanned 1 file / 871891 bytes with issue_count=0, s310_count=0, and Network Communication Detection passed.
Bounded streamed HF text QA: scanned 2 text files / 880821 bytes; vocab.txt had vocab_issues=[], c2_count=0, and Network Communication Detection passed. The repo-level streamed command still exits 1 from README URL findings only, not from vocab.txt.
Added regressions for BERT-like multilingual vocabulary content plus mocked Hugging Face streaming; active prose/config/URL/script/metadata-shaped contexts remain actionable.

Validation run:

PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_text_scanner.py -k "multilingual_tokenizer_vocab" -q
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/test_cli.py -k "streaming_omits_multilingual_vocab_cc_token" -q
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_text_scanner.py -k "tokenizer_vocab or vocabulary or multilingual_tokenizer_vocab or bare_active_vocabulary_tokens or active_vocabulary_context" -q
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_text_scanner.py -q
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/test_cli.py -k "huggingface_streaming" -q
uv run ruff format modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
uv run ruff check --fix modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -n auto -m "not slow and not integration" --maxfail=1 -> 17396 passed, 1292 skipped
uv run ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
git diff --check

chatgpt-codex-connector · 2026-06-11T06:09:21Z

Codex Review: Didn't find any major issues. Bravo.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

mldangelo-oai · 2026-06-11T06:32:49Z

Supplemental pinned vocabulary QA for the later rank-254 note on head 63f0220774c2254bf3ec6fb31e95d3885cdd624a:

Source: google-bert/bert-base-cased@cd5ef92a9fb2f889e972770a36d4ed042daf221e / root vocab.txt
Downloaded artifact: 213450 bytes, 28996 lines, SHA256 eeaa9875b23b04b4c54ef759d03db9d1ba1554838f8fb26c5d96fa551df93d02; vocabulary entries include zombie, zombies, and Trojan.
Direct selected-file CLI QA: --scanners text --max-size 20MB scanned 1 file / 213450 bytes with issue_count=0, s310_count=0, and Network Communication Detection passed.

This is consistent with the multilingual rank-242 QA above; vocabulary-only entries are clean, while active prose/config/URL/script/metadata-shaped contexts remain covered by the added regressions.

ianw-oai

Holding my approval for now: the tokenizer-vocabulary suppression is still pattern-specific (trojan/zombie only), which looks too tailored to the pinned QA cases for this simple-approval pass.

fix(text): contextualize tokenizer vocabulary indicators

134d0f5

mldangelo-oai requested a review from mldangelo June 11, 2026 02:00

mldangelo-oai enabled auto-merge (squash) June 11, 2026 02:01

mldangelo-oai disabled auto-merge June 11, 2026 03:05

test: cover serialized C&C vocabulary boundary

2b2e5b9

mldangelo-oai enabled auto-merge (squash) June 11, 2026 03:59

mldangelo-oai disabled auto-merge June 11, 2026 05:05

fix(text): recognize multilingual tokenizer vocab context

63f0220

mldangelo-oai enabled auto-merge (squash) June 11, 2026 06:37

ianw-oai reviewed Jun 11, 2026

View reviewed changes

Conversation

mldangelo-oai commented Jun 11, 2026

Summary

Root Cause

Security Tradeoff

Validation

Pinned Real-Model Streaming QA

Malicious Controls

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance Benchmarks

Uh oh!

chatgpt-codex-connector Bot commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

ianw-oai left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented Jun 11, 2026 •

edited

Loading