fix: calibrate onnx custom domain findings by mldangelo-oai · Pull Request #1639 · promptfoo/modelaudit

mldangelo-oai · 2026-06-11T01:02:03Z

Summary

Adds a deterministic low-noise ONNX Runtime vendor policy for exact com.microsoft optimized-export operators observed in public transformer ONNX exports: FastGelu and SkipLayerNormalization at com.microsoft opset 1 with empty overload.
Keeps the dedicated S1111 custom-domain path for unknown domains, Microsoft lookalikes, unknown com.microsoft ops, missing/unsupported/conflicting opsets, non-empty overloads, and function-body unknowns.
Adds synthetic malicious/malformed controls plus an opt-in real Hugging Face regression over the exact pinned model revisions.

Root Cause

_is_external_custom_operator() treated every non-standard, non-model-local, non-schema-validated domain identically. ONNX Runtime optimized exports use com.microsoft metadata for runtime-provided kernels, but that domain is not an ONNX standard domain and cannot be validated via onnx.defs.has(), so common benign com.microsoft nodes produced S1111 alongside truly unknown domains.

Security Tradeoff

This does not add com.microsoft as a trusted standard domain. The low-noise policy is exact tuple based and requires:

exact domain com.microsoft
exact operator name in the allow policy
imported supported opset version 1
empty overload
no conflicting duplicate opset imports

Unknown domains and ambiguous vendor claims still fail closed into S1111. Python-like operators remain critical S902 regardless of domain.

Pinned Real-Model QA

Baseline on the starting SHA ab6e58ebe587197b2b46866c69f7d41d75e08a55 reproduced the false-positive path on onnx/model_O4.onnx:

rank 2 sentence-transformers/all-MiniLM-L6-v2@1110a243fdf4706b3f48f1d95db1a4f5529b4d41: com.microsoft nodes SkipLayerNormalization=12, FastGelu=6; S1111 count before fix: 18
rank 5 cross-encoder/ms-marco-MiniLM-L6-v2@c5ee24cb16019beea0893ab7796b1df96625c6b8: com.microsoft nodes SkipLayerNormalization=12, FastGelu=6; S1111 count before fix: 18
rank 20 sentence-transformers/all-mpnet-base-v2@e8c3b32edf5434bc2275fc9bab85f82640a19130: com.microsoft nodes SkipLayerNormalization=25, FastGelu=12; S1111 count before fix: 37

Post-fix direct QA command:

PROMPTFOO_DISABLE_TELEMETRY=1 HF_HOME=/home/dev-user/code/modelaudit-fp-fixes/20260610T215033Z-hf-fp-fixes/task-09-onnx-custom-domain-calibration/.hf-cache uv run python -u - <<'PY'
from collections import Counter
from pathlib import Path
from huggingface_hub import hf_hub_download
import onnx
from modelaudit.scanner_results import ScanResult
from modelaudit.scanners.base import CheckStatus
from modelaudit.scanners.onnx_scanner import OnnxScanner
cases = [
    ("rank2", "sentence-transformers/all-MiniLM-L6-v2", "1110a243fdf4706b3f48f1d95db1a4f5529b4d41", "onnx/model_O4.onnx"),
    ("rank5", "cross-encoder/ms-marco-MiniLM-L6-v2", "c5ee24cb16019beea0893ab7796b1df96625c6b8", "onnx/model_O4.onnx"),
    ("rank20", "sentence-transformers/all-mpnet-base-v2", "e8c3b32edf5434bc2275fc9bab85f82640a19130", "onnx/model_O4.onnx"),
]
scanner = OnnxScanner()
for label, repo, rev, filename in cases:
    path = Path(hf_hub_download(repo_id=repo, revision=rev, filename=filename))
    model = onnx.load(str(path), load_external_data=False)
    microsoft_ops = Counter(node.op_type for node in model.graph.node if node.domain == "com.microsoft")
    result = ScanResult("onnx", scanner=scanner)
    scanner._check_custom_ops(model, str(path), result)
    failed_custom = [c for c in result.checks if c.name == "Custom Operator Domain Check" and c.status == CheckStatus.FAILED]
    microsoft_failed = [c for c in failed_custom if c.details.get("domain") == "com.microsoft"]
    print(label, dict(microsoft_ops), len(failed_custom), len(microsoft_failed), result.metadata.get("custom_domains", []))
PY

Post-fix outcome: all three pinned models still contain the expected com.microsoft operators, and all report total_s1111=0, microsoft_s1111=0, metadata_custom_domains=[].

Validation

PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_onnx_scanner.py -k "microsoft or custom_domain or custom_operator" -m "not slow and not integration" --maxfail=1 → 20 passed
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_onnx_scanner.py -m "not slow and not integration" --maxfail=1 → 260 passed, 3 deselected
MODELAUDIT_RUN_HF_REAL_MODEL_TESTS=1 PROMPTFOO_DISABLE_TELEMETRY=1 HF_HUB_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_onnx_scanner.py -k "pinned_hf" -m "slow and integration" -s --maxfail=1 → 3 passed
uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ → 419 files already formatted
uv run ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ → all checks passed
uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ → success, 474 source files
PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -n auto -m "not slow and not integration" --maxfail=1 → 18075 passed, 793 skipped
git diff --check → clean

mldangelo-oai · 2026-06-11T01:03:04Z

@codex review

github-actions · 2026-06-11T01:04:19Z

Workflow run and artifacts

Performance Benchmarks

Compared 12 shared benchmarks with a regression threshold of 15%.
Status: 0 regressions, 0 improved, 12 stable, 0 new, 0 missing.
Aggregate shared-benchmark median: 1.444s -> 1.437s (-0.5%).

Workload	Benchmark	Target	Size	Files	Baseline	Current	Change	Status
`warm-cache-rescan`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_warm_cached_repository_rescan`	`release-candidate`	547.3 KiB	32	103.37ms	108.79ms	+5.2%	stable
`mixed-model-repository`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_release_candidate_repository`	`release-candidate`	547.3 KiB	32	505.21ms	486.72ms	-3.7%	stable
`nested-payload-review`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_raw]`	`nested_raw`	78 B	1	603.5us	583.6us	-3.3%	stable
`nested-payload-review`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_hex]`	`nested_hex`	130 B	1	638.0us	623.6us	-2.3%	stable
`padded-multi-stream-upload`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_padded_multi_stream_upload`	`multi_stream_padded`	4.1 KiB	1	657.1us	643.8us	-2.0%	stable
`nested-payload-review`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_base64]`	`nested_base64`	98 B	1	591.5us	579.7us	-2.0%	stable
`duplicate-heavy-registry`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_duplicate_registry_snapshot`	`registry-snapshot`	915.2 KiB	13	392.93ms	399.77ms	+1.7%	stable
`clean-training-checkpoint`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_clean_training_checkpoint`	`safe_large`	278.2 KiB	1	110.24ms	108.57ms	-1.5%	stable
`suspicious-pickle-intake`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_suspicious_pickle_intake`	`suspicious-intake`	183.8 KiB	4	144.28ms	145.93ms	+1.1%	stable
`direct-malicious-upload`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_direct_malicious_upload`	`malicious_reduce`	52 B	1	520.7us	525.5us	+0.9%	stable
`chunked-upload-stream`	`tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_chunked_upload_stream`	`chunked_stream`	278.2 KiB	1	112.89ms	112.00ms	-0.8%	stable
`single-checkpoint-preflight`	`tests/benchmarks/test_scan_benchmarks.py::test_scan_single_checkpoint_before_load`	`single_checkpoint.pkl`	183.0 KiB	1	71.72ms	71.84ms	+0.2%	stable

chatgpt-codex-connector · 2026-06-11T01:05:37Z

Codex Review: Didn't find any major issues. 🚀

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…t09-onnx-custom-domain-calibration-20260610

mldangelo-oai · 2026-06-11T01:36:42Z

@codex review

mldangelo-oai · 2026-06-11T01:54:58Z

@codex review

chatgpt-codex-connector · 2026-06-11T01:57:16Z

Codex Review: Didn't find any major issues. More of your lovely PRs please.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

fix: calibrate onnx custom domain findings

ccf5c05

mldangelo-oai requested a review from mldangelo June 11, 2026 01:33

mldangelo-oai enabled auto-merge (squash) June 11, 2026 01:34

Merge remote-tracking branch 'origin/main' into mdangelo/codex/hf-fp-…

4440049

…t09-onnx-custom-domain-calibration-20260610

mldangelo-oai disabled auto-merge June 11, 2026 01:54

mldangelo-oai enabled auto-merge (squash) June 11, 2026 02:23

mldangelo-oai mentioned this pull request Jun 11, 2026

fix: deduplicate onnx custom domain findings #1656

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: calibrate onnx custom domain findings#1639

fix: calibrate onnx custom domain findings#1639
mldangelo-oai wants to merge 2 commits into
mainfrom
mdangelo/codex/hf-fp-t09-onnx-custom-domain-calibration-20260610

mldangelo-oai commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mldangelo-oai commented Jun 11, 2026

Summary

Root Cause

Security Tradeoff

Pinned Real-Model QA

Validation

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance Benchmarks

Uh oh!

chatgpt-codex-connector Bot commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

mldangelo-oai commented Jun 11, 2026

Uh oh!

chatgpt-codex-connector Bot commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Jun 11, 2026 •

edited

Loading