Skip to content

fix: calibrate onnx custom domain findings#1639

Open
mldangelo-oai wants to merge 2 commits into
mainfrom
mdangelo/codex/hf-fp-t09-onnx-custom-domain-calibration-20260610
Open

fix: calibrate onnx custom domain findings#1639
mldangelo-oai wants to merge 2 commits into
mainfrom
mdangelo/codex/hf-fp-t09-onnx-custom-domain-calibration-20260610

Conversation

@mldangelo-oai

Copy link
Copy Markdown
Contributor

Summary

  • Adds a deterministic low-noise ONNX Runtime vendor policy for exact com.microsoft optimized-export operators observed in public transformer ONNX exports: FastGelu and SkipLayerNormalization at com.microsoft opset 1 with empty overload.
  • Keeps the dedicated S1111 custom-domain path for unknown domains, Microsoft lookalikes, unknown com.microsoft ops, missing/unsupported/conflicting opsets, non-empty overloads, and function-body unknowns.
  • Adds synthetic malicious/malformed controls plus an opt-in real Hugging Face regression over the exact pinned model revisions.

Root Cause

_is_external_custom_operator() treated every non-standard, non-model-local, non-schema-validated domain identically. ONNX Runtime optimized exports use com.microsoft metadata for runtime-provided kernels, but that domain is not an ONNX standard domain and cannot be validated via onnx.defs.has(), so common benign com.microsoft nodes produced S1111 alongside truly unknown domains.

Security Tradeoff

This does not add com.microsoft as a trusted standard domain. The low-noise policy is exact tuple based and requires:

  • exact domain com.microsoft
  • exact operator name in the allow policy
  • imported supported opset version 1
  • empty overload
  • no conflicting duplicate opset imports

Unknown domains and ambiguous vendor claims still fail closed into S1111. Python-like operators remain critical S902 regardless of domain.

Pinned Real-Model QA

Baseline on the starting SHA ab6e58ebe587197b2b46866c69f7d41d75e08a55 reproduced the false-positive path on onnx/model_O4.onnx:

  • rank 2 sentence-transformers/all-MiniLM-L6-v2@1110a243fdf4706b3f48f1d95db1a4f5529b4d41: com.microsoft nodes SkipLayerNormalization=12, FastGelu=6; S1111 count before fix: 18
  • rank 5 cross-encoder/ms-marco-MiniLM-L6-v2@c5ee24cb16019beea0893ab7796b1df96625c6b8: com.microsoft nodes SkipLayerNormalization=12, FastGelu=6; S1111 count before fix: 18
  • rank 20 sentence-transformers/all-mpnet-base-v2@e8c3b32edf5434bc2275fc9bab85f82640a19130: com.microsoft nodes SkipLayerNormalization=25, FastGelu=12; S1111 count before fix: 37

Post-fix direct QA command:

PROMPTFOO_DISABLE_TELEMETRY=1 HF_HOME=/home/dev-user/code/modelaudit-fp-fixes/20260610T215033Z-hf-fp-fixes/task-09-onnx-custom-domain-calibration/.hf-cache uv run python -u - <<'PY'
from collections import Counter
from pathlib import Path
from huggingface_hub import hf_hub_download
import onnx
from modelaudit.scanner_results import ScanResult
from modelaudit.scanners.base import CheckStatus
from modelaudit.scanners.onnx_scanner import OnnxScanner
cases = [
    ("rank2", "sentence-transformers/all-MiniLM-L6-v2", "1110a243fdf4706b3f48f1d95db1a4f5529b4d41", "onnx/model_O4.onnx"),
    ("rank5", "cross-encoder/ms-marco-MiniLM-L6-v2", "c5ee24cb16019beea0893ab7796b1df96625c6b8", "onnx/model_O4.onnx"),
    ("rank20", "sentence-transformers/all-mpnet-base-v2", "e8c3b32edf5434bc2275fc9bab85f82640a19130", "onnx/model_O4.onnx"),
]
scanner = OnnxScanner()
for label, repo, rev, filename in cases:
    path = Path(hf_hub_download(repo_id=repo, revision=rev, filename=filename))
    model = onnx.load(str(path), load_external_data=False)
    microsoft_ops = Counter(node.op_type for node in model.graph.node if node.domain == "com.microsoft")
    result = ScanResult("onnx", scanner=scanner)
    scanner._check_custom_ops(model, str(path), result)
    failed_custom = [c for c in result.checks if c.name == "Custom Operator Domain Check" and c.status == CheckStatus.FAILED]
    microsoft_failed = [c for c in failed_custom if c.details.get("domain") == "com.microsoft"]
    print(label, dict(microsoft_ops), len(failed_custom), len(microsoft_failed), result.metadata.get("custom_domains", []))
PY

Post-fix outcome: all three pinned models still contain the expected com.microsoft operators, and all report total_s1111=0, microsoft_s1111=0, metadata_custom_domains=[].

Validation

  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_onnx_scanner.py -k "microsoft or custom_domain or custom_operator" -m "not slow and not integration" --maxfail=1 → 20 passed
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_onnx_scanner.py -m "not slow and not integration" --maxfail=1 → 260 passed, 3 deselected
  • MODELAUDIT_RUN_HF_REAL_MODEL_TESTS=1 PROMPTFOO_DISABLE_TELEMETRY=1 HF_HUB_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_onnx_scanner.py -k "pinned_hf" -m "slow and integration" -s --maxfail=1 → 3 passed
  • uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ → 419 files already formatted
  • uv run ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ → all checks passed
  • uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ → success, 474 source files
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -n auto -m "not slow and not integration" --maxfail=1 → 18075 passed, 793 skipped
  • git diff --check → clean

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex review

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Workflow run and artifacts

Performance Benchmarks

Compared 12 shared benchmarks with a regression threshold of 15%.
Status: 0 regressions, 0 improved, 12 stable, 0 new, 0 missing.
Aggregate shared-benchmark median: 1.444s -> 1.437s (-0.5%).

Workload Benchmark Target Size Files Baseline Current Change Status
warm-cache-rescan tests/benchmarks/test_scan_benchmarks.py::test_scan_warm_cached_repository_rescan release-candidate 547.3 KiB 32 103.37ms 108.79ms +5.2% stable
mixed-model-repository tests/benchmarks/test_scan_benchmarks.py::test_scan_release_candidate_repository release-candidate 547.3 KiB 32 505.21ms 486.72ms -3.7% stable
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_raw] nested_raw 78 B 1 603.5us 583.6us -3.3% stable
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_hex] nested_hex 130 B 1 638.0us 623.6us -2.3% stable
padded-multi-stream-upload tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_padded_multi_stream_upload multi_stream_padded 4.1 KiB 1 657.1us 643.8us -2.0% stable
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_base64] nested_base64 98 B 1 591.5us 579.7us -2.0% stable
duplicate-heavy-registry tests/benchmarks/test_scan_benchmarks.py::test_scan_duplicate_registry_snapshot registry-snapshot 915.2 KiB 13 392.93ms 399.77ms +1.7% stable
clean-training-checkpoint tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_clean_training_checkpoint safe_large 278.2 KiB 1 110.24ms 108.57ms -1.5% stable
suspicious-pickle-intake tests/benchmarks/test_scan_benchmarks.py::test_scan_suspicious_pickle_intake suspicious-intake 183.8 KiB 4 144.28ms 145.93ms +1.1% stable
direct-malicious-upload tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_direct_malicious_upload malicious_reduce 52 B 1 520.7us 525.5us +0.9% stable
chunked-upload-stream tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_chunked_upload_stream chunked_stream 278.2 KiB 1 112.89ms 112.00ms -0.8% stable
single-checkpoint-preflight tests/benchmarks/test_scan_benchmarks.py::test_scan_single_checkpoint_before_load single_checkpoint.pkl 183.0 KiB 1 71.72ms 71.84ms +0.2% stable

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 🚀

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@mldangelo-oai mldangelo-oai requested a review from mldangelo June 11, 2026 01:33
@mldangelo-oai mldangelo-oai enabled auto-merge (squash) June 11, 2026 01:34
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex review

@mldangelo-oai mldangelo-oai disabled auto-merge June 11, 2026 01:54
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. More of your lovely PRs please.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant