Skip to content

fix: recognize GGUF BF16 tensor type#1632

Merged
mldangelo-oai merged 3 commits into
mainfrom
mdangelo/codex/hf-fp-t11-gguf-bf16-type30-20260610
Jun 11, 2026
Merged

fix: recognize GGUF BF16 tensor type#1632
mldangelo-oai merged 3 commits into
mainfrom
mdangelo/codex/hf-fp-t11-gguf-bf16-type30-20260610

Conversation

@mldangelo-oai

Copy link
Copy Markdown
Contributor

Summary

Recognize GGUF/GGML tensor type 30 as BF16 by adding it to the scanner's GGML tensor-size table as (block_size=1, type_size=2).

Root cause

GgufScanner parses tensor metadata correctly, but _GGML_TYPE_INFO did not include GGML type 30. During the second tensor validation pass, _validate_tensor_info() treated every BF16 tensor as an unknown GGML type, marked the scan inconclusive with gguf_structure_validation_failed, and returned before normal tensor size/bounds validation could run.

Security tradeoff

This only recognizes the known BF16 type. Unknown tensor types still fail closed as bounded inconclusive results, malformed tensor metadata still returns an explicit parse-incomplete inconclusive result, and malformed BF16 tensor data still reaches the existing tensor-data bounds checks.

Real-model QA

Pinned repo:

uv run python - <<'PY'
from huggingface_hub import HfApi
revision = "3249fa54d5efa384afc552cc6700ad091efd5c39"
info = HfApi().model_info("unsloth/gemma-4-12b-it-GGUF", revision=revision, files_metadata=True)
print(f"resolved_sha={info.sha}")
for sibling in info.siblings:
    name = sibling.rfilename
    if name.endswith(".gguf") and ("BF16" in name or name == "gemma-4-12b-it-BF16.gguf"):
        lfs = getattr(sibling, "lfs", None)
        sha = getattr(lfs, "sha256", None) if lfs is not None else None
        print(f"file={name} size={sibling.size} lfs_sha256={sha}")
PY

Outcome:

resolved_sha=3249fa54d5efa384afc552cc6700ad091efd5c39
file=MTP/gemma-4-12b-it-BF16-MTP.gguf size=861520128 lfs_sha256=20b0e5caf9152e816a56f92c702528bffc7a7c930f20c33cf6616ac216998037
file=gemma-4-12b-it-BF16.gguf size=23832065184 lfs_sha256=5021bdd970b8557f6c4cb927bf676b1fd14554bca3c14961ea7814e4731e9662
file=mmproj-BF16.gguf size=175115840 lfs_sha256=2e269f906eb15169ee9ce880ea649bd6d42d4964c21f8ede10d0d0efc738bcbb

Bounded sparse prefix fetches, with range checks before body reads:

uv run python - <<'PY'
from pathlib import Path
from urllib.request import Request, urlopen

revision = "3249fa54d5efa384afc552cc6700ad091efd5c39"
work = Path("/tmp/modelaudit-gguf-bf16-type30")
work.mkdir(parents=True, exist_ok=True)
files = [
    ("gemma-4-12b-it-BF16.gguf", 15_822_496, 23_832_065_184, work / "gemma-4-12b-it-BF16.gguf"),
    ("mmproj-BF16.gguf", 2_624, 175_115_840, work / "mmproj-BF16.gguf"),
]
for filename, prefix_size, logical_size, output in files:
    url = f"https://huggingface.co/unsloth/gemma-4-12b-it-GGUF/resolve/{revision}/{filename}"
    req = Request(url, headers={"Range": f"bytes=0-{prefix_size - 1}", "User-Agent": "modelaudit-codex-bounded-repro"})
    with urlopen(req, timeout=120) as resp:
        status = getattr(resp, "status", None)
        content_range = resp.headers.get("Content-Range")
        expected_range = f"bytes 0-{prefix_size - 1}/{logical_size}"
        if status != 206 or content_range != expected_range:
            raise RuntimeError(f"unexpected range response for {filename}: status={status} content_range={content_range!r}")
        data = resp.read(prefix_size + 1)
    if len(data) != prefix_size:
        raise RuntimeError(f"short range body for {filename}: got {len(data)} expected {prefix_size}")
    output.write_bytes(data)
    with output.open("r+b") as handle:
        handle.truncate(logical_size)
    print(f"{filename}: fetched={prefix_size} logical_size={output.stat().st_size} content_range={content_range}")
PY

Outcome:

gemma-4-12b-it-BF16.gguf: fetched=15822496 logical_size=23832065184 content_range=bytes 0-15822495/23832065184
mmproj-BF16.gguf: fetched=2624 logical_size=175115840 content_range=bytes 0-2623/175115840

Pre-fix exact-main proof from detached worktree at ab6e58ebe587197b2b46866c69f7d41d75e08a55:

git worktree add --detach /tmp/modelaudit-t11-baseline ab6e58ebe587197b2b46866c69f7d41d75e08a55
cd /tmp/modelaudit-t11-baseline
uv run python - <<'PY'
from modelaudit.scanners.gguf_scanner import GgufScanner
path = "/tmp/modelaudit-gguf-bf16-type30/gemma-4-12b-it-BF16.gguf"
scanner = GgufScanner()
scanner.calculate_file_hashes = lambda _path: {"md5": "skipped", "sha256": "skipped", "sha512": "skipped"}
result = scanner.scan(path)
type_checks = [check for check in result.checks if check.name == "Tensor Type Validation"]
print(f"success={result.success}")
print(f"scan_outcome={result.metadata.get('scan_outcome')}")
print(f"scan_outcome_reasons={result.metadata.get('scan_outcome_reasons')}")
print(f"tensor_type_validation_count={len(type_checks)}")
if type_checks:
    first = type_checks[0]
    print(f"first_message={first.message}")
    print(f"first_details={first.details}")
PY

Outcome:

success=False
scan_outcome=inconclusive
scan_outcome_reasons=['gguf_structure_validation_failed']
tensor_type_validation_count=329
first_message=Tensor token_embd.weight uses unknown GGML type 30
first_details={'tensor_name': 'token_embd.weight', 'tensor_type': 30}

Post-fix pinned main BF16 sparse prefix:

uv run python - <<'PY'
from modelaudit.scanners.gguf_scanner import GgufScanner
path = "/tmp/modelaudit-gguf-bf16-type30/gemma-4-12b-it-BF16.gguf"
scanner = GgufScanner()
scanner.calculate_file_hashes = lambda _path: {"md5": "skipped", "sha256": "skipped", "sha512": "skipped"}
result = scanner.scan(path)
type_checks = [check for check in result.checks if check.name == "Tensor Type Validation"]
unknown_issues = [issue for issue in result.issues if "unknown ggml type" in issue.message.lower()]
print(f"success={result.success}")
print(f"scan_outcome={result.metadata.get('scan_outcome')}")
print(f"scan_outcome_reasons={result.metadata.get('scan_outcome_reasons')}")
print(f"tensor_type_validation_count={len(type_checks)}")
print(f"unknown_ggml_issue_count={len(unknown_issues)}")
print(f"reported_tensors_first_two={result.metadata.get('tensors', [])[:2]}")
PY

Outcome:

success=True
scan_outcome=None
scan_outcome_reasons=None
tensor_type_validation_count=0
unknown_ggml_issue_count=0
reported_tensors_first_two=[{'name': 'rope_freqs.weight', 'type': 0, 'dims': [256]}, {'name': 'token_embd.weight', 'type': 30, 'dims': [3840, 262144]}]

Post-fix aggregate path on pinned mmproj-BF16.gguf sparse prefix:

PROMPTFOO_DISABLE_TELEMETRY=1 uv run python - <<'PY'
from modelaudit.core import determine_exit_code, scan_model_directory_or_file
path = "/tmp/modelaudit-gguf-bf16-type30/mmproj-BF16.gguf"
result = scan_model_directory_or_file(path, cache_enabled=False)
unknown_issues = [issue.message for issue in result.issues if "unknown ggml type" in issue.message.lower()]
metadata = next(iter(result.file_metadata.values()))
print(f"aggregate_success={result.success}")
print(f"exit_code={determine_exit_code(result)}")
print(f"scan_outcome={metadata.get('scan_outcome')}")
print(f"unknown_ggml_issue_count={len(unknown_issues)}")
print(f"format={metadata.get('format')} n_tensors={metadata.get('n_tensors')} first_tensor={metadata.get('tensors', [None])[0]}")
PY

Outcome:

aggregate_success=True
exit_code=0
scan_outcome=None
unknown_ggml_issue_count=0
format=gguf n_tensors=11 first_tensor={'name': 'mm.a.input_projection.weight', 'type': 30, 'dims': [640, 3840]}

Validation

PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_gguf_scanner.py -k "bf16 or unknown_tensor_type or truncated_tensor_dimensions or tensor_information_byte_limit" -q
6 passed, 55 deselected

PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_gguf_scanner.py tests/test_gguf_sbom_integration.py tests/test_huggingface_extensions.py tests/utils/helpers/test_asset_from_scan_result.py -q
74 passed, 16 skipped

uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
419 files already formatted

uv run ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
All checks passed!

uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/
Success: no issues found in 474 source files

PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -n auto -m "not slow and not integration" --maxfail=1
16891 passed, 1292 skipped, 30 warnings

Notes

Large model artifacts were not committed. The real-model files used for QA were sparse temporary files under /tmp/modelaudit-gguf-bf16-type30, created from bounded HTTP range reads and logical truncation only.

Add GGUF tensor type 30 to the known GGML type table and cover BF16, unsupported tensor type, and malformed tensor metadata behavior.

Copy link
Copy Markdown
Contributor Author

@codex review

@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Workflow run and artifacts

Performance Benchmarks

Compared 12 shared benchmarks with a regression threshold of 15%.
Status: 0 regressions, 0 improved, 12 stable, 0 new, 0 missing.
Aggregate shared-benchmark median: 1.445s -> 1.455s (+0.7%).

Workload Benchmark Target Size Files Baseline Current Change Status
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_hex] nested_hex 130 B 1 561.8us 535.6us -4.7% stable
warm-cache-rescan tests/benchmarks/test_scan_benchmarks.py::test_scan_warm_cached_repository_rescan release-candidate 547.3 KiB 32 98.49ms 101.65ms +3.2% stable
padded-multi-stream-upload tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_padded_multi_stream_upload multi_stream_padded 4.1 KiB 1 563.7us 578.5us +2.6% stable
direct-malicious-upload tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_direct_malicious_upload malicious_reduce 52 B 1 457.1us 447.9us -2.0% stable
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_base64] nested_base64 98 B 1 508.5us 518.0us +1.9% stable
nested-payload-review tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_nested_payload_review[nested_raw] nested_raw 78 B 1 516.1us 507.6us -1.7% stable
duplicate-heavy-registry tests/benchmarks/test_scan_benchmarks.py::test_scan_duplicate_registry_snapshot registry-snapshot 915.2 KiB 13 411.11ms 416.21ms +1.2% stable
suspicious-pickle-intake tests/benchmarks/test_scan_benchmarks.py::test_scan_suspicious_pickle_intake suspicious-intake 183.8 KiB 4 144.50ms 145.60ms +0.8% stable
single-checkpoint-preflight tests/benchmarks/test_scan_benchmarks.py::test_scan_single_checkpoint_before_load single_checkpoint.pkl 183.0 KiB 1 75.11ms 75.42ms +0.4% stable
mixed-model-repository tests/benchmarks/test_scan_benchmarks.py::test_scan_release_candidate_repository release-candidate 547.3 KiB 32 483.80ms 484.71ms +0.2% stable
clean-training-checkpoint tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_clean_training_checkpoint safe_large 278.2 KiB 1 112.95ms 112.83ms -0.1% stable
chunked-upload-stream tests/benchmarks/test_picklescan_benchmarks.py::test_picklescan_chunked_upload_stream chunked_stream 278.2 KiB 1 116.08ms 116.19ms +0.1% stable

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. You're on a roll.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Copy Markdown
Contributor Author

@codex review

@mldangelo-oai mldangelo-oai requested a review from mldangelo June 11, 2026 01:06
@mldangelo-oai mldangelo-oai enabled auto-merge (squash) June 11, 2026 01:06
@mldangelo-oai mldangelo-oai disabled auto-merge June 11, 2026 01:07
@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Bravo.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@mldangelo-oai mldangelo-oai enabled auto-merge (squash) June 11, 2026 01:39
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

New real-model regression input from origin/main 8d6c4864: google/gemma-4-26B-A4B-it-qat-q4_0-gguf produced 191 S902 notices for valid GGML type 30 tensors, plus only passive README S309s, then returned success=false/exit 2. Please run this exact model against the PR head, confirm the type-30 flood and failure disappear, and retain malformed/unsupported-type controls. @codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. You're on a roll.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@mldangelo-oai mldangelo-oai disabled auto-merge June 11, 2026 02:26

Copy link
Copy Markdown
Contributor Author

Real-model QA for the new Gemma GGUF case completed on current PR head.

PR head / fetch state:

  • Branch: mdangelo/codex/hf-fp-t11-gguf-bf16-type30-20260610
  • HEAD: 4f31472e480e129fa23bab5b09d902c713987702
  • origin/main: 8d6c4864fe2ea833ceaef1b9803d225afb1e8d69
  • The live PR branch was fetched first; no new commit was made and the worktree stayed clean.

Pinned model revision:

  • Requested: google/gemma-4-26B-A4B-it-qat-q4_0-gguf@dfc00409adc70be497fee9c90bfe76b3ee130f2e
  • HfApi().repo_info(repo, revision=None).sha and HfApi().repo_info(repo, revision=target).sha both resolved to dfc00409adc70be497fee9c90bfe76b3ee130f2e, so the full hf://... streaming run scanned that exact commit.
  • Files at that revision: .gitattributes, README.md, gemma-4-26B-it-mmproj.gguf, gemma-4-26B_q4_0-it.gguf.

Exact full streaming command:

PROMPTFOO_DISABLE_TELEMETRY=1 uv run modelaudit scan --stream --no-cache --format json --output /tmp/pr1632-gemma4-type30-stream.json --timeout 7200 --max-size 20GB hf://google/gemma-4-26B-A4B-it-qat-q4_0-gguf

Terminal outcome:

  • Process exit: 0
  • Terminal ended with: ✅ Streaming scan complete and Results written to /tmp/pr1632-gemma4-type30-stream.json
  • JSON result: success=true, has_errors=false

Scanner IDs / assets:

  • text: 1 asset (README.md)
  • gguf: 2 assets (gemma-4-26B-it-mmproj.gguf, gemma-4-26B_q4_0-it.gguf)
  • jinja2_template: 1 embedded scan from gemma-4-26B_q4_0-it.gguf
  • GGUF tensor counts reported: 356 for gemma-4-26B-it-mmproj.gguf, 658 for gemma-4-26B_q4_0-it.gguf

Issue/check counts:

  • Issues: 18 total, all info; no warnings, no criticals, no errors
  • S309: 17 README URL/domain findings
  • S902: 1 metadata-value info check for tokenizer.chat_template in gemma-4-26B_q4_0-it.gguf
  • Type-30 flood check: 0 Tensor Type Validation checks and 0 matches for unknown GGML type / type 30

Fail-closed controls rechecked:

PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_gguf_scanner.py -k "unknown_tensor_type or bf16 or truncated_tensor_dimensions or tensor_bounds_detects_uint64_wrap or tensor_information_byte_limit or default_tensor_count_limit_blocks_compact_resource_exhaustion or tensor_metadata_summary_is_capped_without_skipping_validation"

Result: 9 passed, 52 deselected.

PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_gguf_scanner.py

Result: 60 passed, 1 skipped (jinja2.sandbox unavailable in this local env).

Conclusion: the origin/main type-30 S902 flood and operational failure are gone on the PR head for the exact Gemma revision. Unsupported tensor types, malformed tensor metadata, impossible offsets, truncation, and tensor/metadata resource controls remain bounded and fail closed in the focused and full GGUF regression suite.

@mldangelo-oai mldangelo-oai enabled auto-merge (squash) June 11, 2026 03:04
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

New high-volume exact-main reproduction: leejet/ideogram-4-GGUF@c93c0ac616d3abc7910c9af0bf117244ce3a11c4 contains ideogram4-Q4_0.gguf and ideogram4_uncond-Q4_0.gguf; ModelAudit emits 508 S902 findings (254 per file) for valid GGML type 30 and marks both files inconclusive. Validate the current PR head removes the entire flood without weakening unknown-type/truncated/layout controls, and post exact counts before merge. example.png noise belongs to #1628 separately.

@mldangelo-oai mldangelo-oai disabled auto-merge June 11, 2026 03:15
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Exact-head no-change QA for the new Ideogram GGUF type-30 reproduction is complete.

PR head:

  • Branch: mdangelo/codex/hf-fp-t11-gguf-bf16-type30-20260610
  • HEAD: 4f31472e480e129fa23bab5b09d902c713987702
  • Worktree stayed clean; no code changes, no commit, no push.

Pinned reproduction scanned:

  • Repo: leejet/ideogram-4-GGUF
  • Revision: c93c0ac616d3abc7910c9af0bf117244ce3a11c4
  • Files: ideogram4-Q4_0.gguf, ideogram4_uncond-Q4_0.gguf
  • Streaming harness used scan_model_streaming(..., delete_after_scan=True, cache_enabled=False, use_hf_whitelist=False, max_file_size=100GiB, max_total_size=200GiB) with a pinned hf_hub_download() generator for exactly those two files. This was needed because repo shorthand does not preserve @commit in this CLI path.

Pinned scan result:

  • Process/result exit: 0
  • success=true, files_scanned=2
  • Total bytes downloaded/scanned: 11,287,641,664 (5,643,820,832 per file)
  • Total issues: 0
  • Total checks: 8; failed checks: 0
  • S902 issues/checks: 0 / 0
  • Tensor Type Validation checks: 0
  • unknown GGML type / tensor-type issue matches: 0
  • Inconclusive files: 0

Per-file tensor evidence:

  • ideogram4-Q4_0.gguf: 458 tensors reported; 204 type 2 Q4_0, 254 type 30 BF16; 0 issues, 0 failed checks, not inconclusive.
  • ideogram4_uncond-Q4_0.gguf: 458 tensors reported; 204 type 2 Q4_0, 254 type 30 BF16; 0 issues, 0 failed checks, not inconclusive.

Resource/timing evidence:

  • Wall elapsed: 98.002s
  • Download elapsed: 14.632s for ideogram4-Q4_0.gguf; 14.757s for ideogram4_uncond-Q4_0.gguf
  • Peak RSS: 2,512,112 KB (ru_maxrss from the scan process)
  • CPU: 77.281s user, 55.874s system
  • Page faults: 2 major, 110,826 minor
  • Block I/O counters: 16 inputs, 22,046,952 outputs

Regression/control validation:

PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -q \
  tests/scanners/test_gguf_scanner.py::test_gguf_bf16_tensor_type_30_is_supported \
  tests/scanners/test_gguf_scanner.py::test_gguf_unknown_tensor_type_is_inconclusive \
  tests/scanners/test_gguf_scanner.py::test_gguf_bf16_truncated_tensor_data_reports_bounds_check \
  tests/scanners/test_gguf_scanner.py::test_gguf_truncated_tensor_dimensions_are_parse_inconclusive \
  tests/scanners/test_gguf_scanner.py::test_gguf_scanner_tensor_bounds_detects_uint64_wrap \
  tests/scanners/test_gguf_scanner.py::test_gguf_scanner_tensor_size_validation \
  tests/scanners/test_gguf_scanner.py::test_gguf_scanner_invalid_tensor_dimensions \
  tests/scanners/test_gguf_scanner.py::test_gguf_scanner_large_tensor_count \
  tests/scanners/test_gguf_scanner.py::test_gguf_tensor_information_byte_limit_is_inconclusive \
  tests/scanners/test_gguf_scanner.py::test_gguf_scanner_metadata_types \
  tests/scanners/test_gguf_scanner.py::test_gguf_truncated_metadata_returns_exit2

Result: 11 passed in 1.38s.

PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -q tests/scanners/test_gguf_scanner.py tests/test_gguf_sbom_integration.py

Result: 60 passed, 5 skipped in 2.38s (jinja2.sandbox unavailable locally; Python lane allowlist skips expected).

Additional checks:

  • uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ -> 419 files already formatted
  • uv run ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ -> All checks passed!
  • uv run mypy modelaudit/scanners/gguf_scanner.py tests/scanners/test_gguf_scanner.py tests/test_gguf_sbom_integration.py -> Success: no issues found in 3 source files
  • git diff --check -> clean

Conclusion: the exact 508 type-30 tensors (254 per file) are present in the pinned Ideogram files and no longer produce any S902/type-validation findings or inconclusive outcomes on the PR head. Unknown/reserved tensor type, truncated payload/tensor metadata, size/shape/layout, uint64 overflow, resource-limit, malformed metadata, and Q4_0 quantized-type controls remain covered by the focused/full GGUF tests. CI is still green; PR remains blocked only on review. Not merged.

@mldangelo-oai mldangelo-oai enabled auto-merge (squash) June 11, 2026 03:34
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

New exact-main GGUF QA inputs: unsloth/gemma-4-E4B-it-qat-GGUF@bbcd9d849c2541ecc2af7ef64b3c3c2c7aa14e96 and ggml-org/gemma-4-12B-it-GGUF@44ee90c4b61e888ac5b318a54ec7a94df61e9cd7. Main emits hundreds of S902 records and marks the BF16 GGUFs inconclusive; rank 175 includes many valid type-30 tensors and rank 178 marks gemma-4-12B-it-bf16.gguf plus mmproj-gemma-4-12B-it-bf16.gguf as gguf_structure_validation_failed. Auto-merge is paused. Validate the PR head on the exact affected files, separate type-30 support from any real structural defect, and retain malformed/unknown-type controls.

Copy link
Copy Markdown
Contributor Author

Additional pinned GGUF QA: HauhauCS/Qwen3.5-9B-Uncensored-HauhauCS-Aggressive@0a41c68809d375475f954be12ba7c40efa56c2a9 emitted roughly 363 informational S902 records for valid BF16/type-30 tensors, plus chat-template metadata noise, on main 8d6c4864. @codex scan the exact BF16 GGUF on this head, verify type-30 records disappear without suppressing truly unknown tensor types or malformed table bounds, and post counts before/after.

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Chef's kiss.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Exact-head QA follow-up for 4f31472e480e129fa23bab5b09d902c713987702 against origin/main 8d6c4864fe2ea833ceaef1b9803d225afb1e8d69: no code changes, no push.

Pinned GGUF evidence was downloaded under /tmp/modelaudit-pr1632-t11-qa-20260611T035154Z and SHA-256 matched HF LFS OIDs:

Repo revision File Size SHA-256 Download/hash time Peak RSS
unsloth/gemma-4-E4B-it-qat-GGUF@bbcd9d849c2541ecc2af7ef64b3c3c2c7aa14e96 MTP/gemma-4-E4B-it-BF16-MTP.gguf 171,784,064 e08c441a102c748eb4c516f5fe710908b0091ac604b1022ac5da2a7911a4073b 3.448s 203,960 KB
ggml-org/gemma-4-12B-it-GGUF@44ee90c4b61e888ac5b318a54ec7a94df61e9cd7 mmproj-gemma-4-12B-it-bf16.gguf 175,115,584 675ad6e68101ca9413ec806855c452362f0213f2dfc5800996b086fdb8119842 2.201s 209,344 KB
ggml-org/gemma-4-12B-it-GGUF@44ee90c4b61e888ac5b318a54ec7a94df61e9cd7 gemma-4-12B-it-bf16.gguf 23,832,064,928 9338475993d2b4b86395720d49eb2a244b25eac0fa1f95f2caef381ea9503301 44.440s download / 36.444s hash 2,654,468 KB

Main vs head scan evidence:

File Tensor type-30 count Main result PR head result
MTP/gemma-4-E4B-it-BF16-MTP.gguf 23 / 49 tensors exit 2, gguf_structure_validation_failed, 24 S902 checks, 23 Tensor Type Validation failures exit 0, no scan outcome reasons, 1 S902 metadata info check
mmproj-gemma-4-12B-it-bf16.gguf 2 / 11 tensors exit 2, gguf_structure_validation_failed, 4 S902 checks, 2 Tensor Type Validation failures exit 0, no scan outcome reasons, 2 S902 metadata info checks
gemma-4-12B-it-bf16.gguf 329 / 667 tensors exit 2, gguf_structure_validation_failed, 332 S902 checks, 329 Tensor Type Validation failures; aggregate elapsed 221.818s, peak RSS 237,228 KB exit 0, no scan outcome reasons, 3 S902 metadata info checks; aggregate elapsed 177.325s, peak RSS 235,312 KB

Interpretation: the main failures are the BF16 type-30 false-positive flood, not a real structural/layout defect. On the PR head, the same pinned files retain tensor coverage and exit cleanly; remaining S902 records are metadata value info checks, not structural inconclusive reasons.

Local validation:

  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_gguf_scanner.py -q -k "bf16 or unknown_tensor_type or truncated_tensor_dimensions or tensor_bounds" -> 7 passed
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_gguf_scanner.py -q -> 60 passed, 1 skipped (jinja2.sandbox unavailable)
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/test_gguf_sbom_integration.py -q -> 4 skipped on Python 3.12 allowlist
  • uv run ruff check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ -> pass
  • uv run ruff format --check modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ -> 419 files already formatted
  • uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ -> success, 474 source files
  • git diff --check -> clean
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -n auto -m "not slow and not integration" --maxfail=1 -> 17,369 passed, 1,292 skipped, 39 warnings in 17:15

CI/review poll at exact head: all required checks are passing or intentionally skipped; reviewDecision=REVIEW_REQUIRED, mergeStateStatus=BLOCKED only pending approval. No fresh Codex review was requested because the PR head did not change.

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

@codex add this pinned GGUF BF16 case: yuxinlu1/gemma-4-12B-it-Claude-4.6-4.8-Opus-GGUF@19e33723755977da0b9b5482a88772aad4dfce03 reports valid tensor type 30 as unknown and marks MTP/gemma-4-12B-it-MTP-BF16.gguf inconclusive with gguf_structure_validation_failed. Retest this exact file with the rank-187 control.

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Exact pinned GGUF BF16 QA completed on current PR head 4f31472e480e129fa23bab5b09d902c713987702 against origin/main 8d6c4864fe2ea833ceaef1b9803d225afb1e8d69. No code changes were needed; the worktree stayed clean and a no-op push returned Everything up-to-date.

Pinned files downloaded under /tmp/modelaudit-pr1632-pinned-qa-20260611T000000Z; SHA-256 matched Hugging Face LFS OIDs:

Repo revision File Size SHA-256
yuxinlu1/gemma-4-12B-it-Claude-4.6-4.8-Opus-GGUF@19e33723755977da0b9b5482a88772aad4dfce03 MTP/gemma-4-12B-it-MTP-BF16.gguf 861,520,128 20b0e5caf9152e816a56f92c702528bffc7a7c930f20c33cf6616ac216998037
google/gemma-4-E4B-it-qat-q4_0-gguf@bb3b92e6f031fa438b409f898dd9f14f499a0cb0 gemma-4-E4B-it-mmproj.gguf 991,551,904 c6398448d84a4836fdedf58f9775979e69ae0cc4dfdf4d697b5597693a555b12

Exact bounded scan command shape:

PROMPTFOO_DISABLE_TELEMETRY=1 uv run modelaudit scan --no-cache --scanners gguf --format json --output <out.json> --timeout 7200 --max-size 5GB <two downloaded .gguf files>

Before/after results:

File Tensor type-30 count origin/main result PR head result
MTP/gemma-4-12B-it-MTP-BF16.gguf 23 / 49 tensors exit 2, success=false, gguf_structure_validation_failed, 24 S902 issues, 23 unknown GGML type 30 exit 0, success=true, no scan outcome reasons, 1 S902 metadata info check for general.license.link, 0 tensor-type validations
gemma-4-E4B-it-mmproj.gguf 247 / 1,411 tensors exit 2, success=false, gguf_structure_validation_failed, 247 S902 issues, 247 unknown GGML type 30 exit 0, success=true, no scan outcome reasons, 0 issues/check failures, 0 tensor-type validations

Fail-closed controls rechecked on this head:

  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_gguf_scanner.py -q -k "bf16 or unknown_tensor_type or truncated_tensor_dimensions or tensor_bounds_detects_uint64_wrap or tensor_information_byte_limit or default_tensor_count_limit_blocks_compact_resource_exhaustion or tensor_metadata_summary_is_capped_without_skipping_validation or truncated_metadata" -> 12 passed, 49 deselected
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest tests/scanners/test_gguf_scanner.py -q -> 60 passed, 1 skipped (jinja2.sandbox unavailable locally)
  • uv run ruff check modelaudit/scanners/gguf_scanner.py tests/scanners/test_gguf_scanner.py -> pass
  • uv run ruff format --check modelaudit/scanners/gguf_scanner.py tests/scanners/test_gguf_scanner.py -> pass
  • uv run mypy modelaudit/scanners/gguf_scanner.py tests/scanners/test_gguf_scanner.py tests/test_gguf_sbom_integration.py -> pass
  • git diff --check -> clean

Conclusion: the exact pinned BF16/type-30 files no longer produce unknown-type floods or inconclusive GGUF structure outcomes on the PR head. Unknown/future tensor types, malformed/truncated metadata and tensor tables, tensor dimension parsing, tensor bounds/uint64 overflow, and tensor resource limits remain covered and fail closed.

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Keep it up!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@chatgpt-codex-connector

Copy link
Copy Markdown

Summary

  • Added _write_gguf_with_tensor_types, a compact GGUF fixture helper that builds files with an explicit sequence of tensor types and deterministic offsets/data sizing. tests/scanners/test_gguf_scanner.pyL126-L154
  • Added the pinned MTP/gemma-4-12B-it-MTP-BF16.gguf rank-187 BF16/type-30 regression control: 187 tensors of type 30, no Tensor Type Validation failures, no unknown ggml type issues, and aggregate exit code 0. tests/scanners/test_gguf_scanner.pyL1092-L1110
  • Added the paired rank-187-plus-unknown control to ensure type 30 support does not weaken fail-closed handling for truly unknown tensor types; the appended type 999 still yields gguf_structure_validation_failed and aggregate exit code 2. tests/scanners/test_gguf_scanner.pyL1113-L1136
  • Committed changes on the current branch: c0b2c1a test: add pinned GGUF BF16 rank control.
  • Created the requested PR via make_pr.

Testing

  • ⚠️ uv run python - <<'PY' ... HfApi().model_info("yuxinlu1/gemma-4-12B-it-Claude-4.6-4.8-Opus-GGUF", revision="19e33723755977da0b9b5482a88772aad4dfce03", files_metadata=True) ... PY — blocked by environment network proxy with httpx.ProxyError: 403 Forbidden, so I could not re-download the live pinned HF artifact in this environment.
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -q tests/scanners/test_gguf_scanner.py -k "pinned_gemma_mtp or bf16 or unknown_tensor_type or truncated_tensor_dimensions" — 7 passed, 56 deselected.
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -q tests/scanners/test_gguf_scanner.py::test_gguf_pinned_gemma_mtp_bf16_rank187_control_is_supported tests/scanners/test_gguf_scanner.py::test_gguf_pinned_gemma_mtp_rank187_control_retains_unknown_type_failure tests/scanners/test_gguf_scanner.py::test_gguf_unknown_tensor_type_is_inconclusive tests/scanners/test_gguf_scanner.py::test_gguf_bf16_truncated_tensor_data_reports_bounds_check tests/scanners/test_gguf_scanner.py::test_gguf_truncated_tensor_dimensions_are_parse_inconclusive tests/scanners/test_gguf_scanner.py::test_gguf_scanner_tensor_bounds_detects_uint64_wrap tests/scanners/test_gguf_scanner.py::test_gguf_tensor_information_byte_limit_is_inconclusive — 7 passed.
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -q tests/scanners/test_gguf_scanner.py — 62 passed, 1 skipped (jinja2.sandbox unavailable locally).
  • uv run ruff format modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ — 419 files left unchanged.
  • uv run ruff check --fix modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ — All checks passed.
  • uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ — Success, 474 source files.
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -n auto -m "not slow and not integration" --maxfail=1 — failed in unrelated CLI symlink-output tests after 12,057 passed / 848 skipped; no GGUF failures observed before stop.

View task →

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Fresh pinned QA on exact origin/main 8d6c4864fe2ea833ceaef1b9803d225afb1e8d69 reproduced the BF16 type-30 failure on another current model:

  • Model: LiquidAI/LFM2.5-8B-A1B-GGUF
  • HF revision: dfd5fdcad7a1c0d31473fb4ca443b8befbacddf0
  • Selection rank: 244
  • Declared repository size: 65,895,100,644 bytes across 10 artifacts
  • Scanned file: LFM2.5-8B-A1B-BF16.gguf
  • Main result: 164 issues, dominated by S902 Tensor ... uses unknown GGML type 30 across BF16 tensors

Please add this pinned model as a second real-world QA case at the current PR head and prove that type 30 reaches normal tensor-size/bounds validation with zero unknown-type findings. Keep an unknown numeric type control fail-closed.

@mldangelo-oai

mldangelo-oai commented Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

Pinned real-model QA from Hugging Face rank 250 on exact main 8d6c4864:

  • HauhauCS/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive@45b6a334b4bcd1d7f37179df58b3b1d66a184e5d
  • 11 valid GGUF variants were inspected.
  • 289 informational findings report unknown GGML type 21.
  • 336 report unknown GGML type 23.
  • 11 report unknown GGML type 30.

This is 636 compatibility false positives across the pinned repository, not malicious evidence. Please include this exact model/revision and all three types in end-to-end QA before merge. Full audit: /Users/mdangelo/modelaudit-hf-scan-swarm/outputs/hf-audit-r250-r252-20260611.md.

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Additional exact-main GGUF QA from Hugging Face rank 253:

  • Model: google/gemma-4-E2B-it-qat-q4_0-gguf@1894d1fc0a19d86697abd40483f5983c867df03f
  • Valid file: gemma-4-E2B-it-mmproj.gguf (986,833,312 bytes)
  • Main 8d6c4864 emits 247 informational unknown GGML type 30 findings and exits 2.

Please add this exact pinned model/file to the current-head QA alongside rank 250's type-21/23/30 matrix. Full audit: /Users/mdangelo/modelaudit-hf-scan-swarm/outputs/hf-audit-r253-r254-20260611.md.

@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Pinned rank-250 GGUF follow-up completed on additive commit f08b2c4868b7a02bc75b014c0fe03c1eff990c51.

Upstream format check:

  • Current llama.cpp/ggml enum maps 21 -> GGML_TYPE_IQ3_S, 23 -> GGML_TYPE_IQ4_XS, and 30 -> GGML_TYPE_BF16.
  • Current block/type sizes used for bounds validation: IQ3_S=(256, 110), IQ4_XS=(256, 136), BF16=(1, 2).

Pinned header QA:

  • Repo: HauhauCS/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive
  • Requested/resolved revision: 45b6a334b4bcd1d7f37179df58b3b1d66a184e5d
  • Parsed all 11 model GGUF variants with 16,777,216-byte range reads; all had metadata_end=15776598, tensor_info_start=15776598, tensor_info_end=15819406, tensor_data_start=15819424.
  • Header tensor totals exactly matched the reproduced false-positive flood: type 21: 289, type 23: 336, type 30: 11.
File Size type 21 type 23 type 30
Gemma-4-E4B-Uncensored-HauhauCS-Aggressive-IQ3_M.gguf 4,714,690,528 289 0 1
Gemma-4-E4B-Uncensored-HauhauCS-Aggressive-IQ4_XS.gguf 5,070,950,368 0 336 1
Gemma-4-E4B-Uncensored-HauhauCS-Aggressive-Q2_K_P.gguf 4,431,882,208 0 0 1
Gemma-4-E4B-Uncensored-HauhauCS-Aggressive-Q3_K_M.gguf 4,850,391,008 0 0 1
Gemma-4-E4B-Uncensored-HauhauCS-Aggressive-Q3_K_P.gguf 4,884,505,568 0 0 1
Gemma-4-E4B-Uncensored-HauhauCS-Aggressive-Q4_K_M.gguf 5,335,285,728 0 0 1
Gemma-4-E4B-Uncensored-HauhauCS-Aggressive-Q4_K_P.gguf 5,369,246,688 0 0 1
Gemma-4-E4B-Uncensored-HauhauCS-Aggressive-Q5_K_M.gguf 5,762,908,128 0 0 1
Gemma-4-E4B-Uncensored-HauhauCS-Aggressive-Q5_K_P.gguf 5,812,940,768 0 0 1
Gemma-4-E4B-Uncensored-HauhauCS-Aggressive-Q6_K_P.gguf 6,249,794,528 0 0 1
Gemma-4-E4B-Uncensored-HauhauCS-Aggressive-Q8_K_P.gguf 8,133,226,464 0 0 1

Representative tensor evidence from the pinned tensor tables:

  • Type 21 / IQ3_S: blk.0.attn_k.weight, index 6, dims [2560, 512], record offset 15776946, relative payload offset 2917675008, absolute range [2933494432, 2934057632), expected bytes 563200.
  • Type 23 / IQ4_XS: blk.0.attn_k.weight, index 6, dims [2560, 512], record offset 15776946, relative payload offset 2917675008, absolute range [2933494432, 2934190752), expected bytes 696320.
  • Type 30 / BF16: per_layer_model_proj.weight, index 1, dims [2560, 10752], record offset 15776648, relative payload offset 10240, absolute range [15829664, 70879904), expected bytes 55050240.

Exact pinned end-to-end scans:

  • Ran all 11 direct immutable file URLs with PROMPTFOO_DISABLE_TELEMETRY=1 uv run modelaudit scan --no-cache --scanners gguf --format json --timeout 7200 --max-size 10GB <https://huggingface.co/.../resolve/45b6.../<file>.gguf>.
  • Result: 11/11 exit 0, success=true, has_errors=false, zero Tensor Type Validation checks, zero unknown GGML type matches, zero warning/critical findings. The remaining 33 findings were INFO-only metadata checks.

Regression/validation:

  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -q tests/scanners/test_gguf_scanner.py -k "rank250 or bf16 or unknown_tensor_type or truncated_tensor_dimensions or tensor_bounds_detects_uint64_wrap or tensor_information_byte_limit or default_tensor_count_limit_blocks_compact_resource_exhaustion or tensor_metadata_summary_is_capped_without_skipping_validation or truncated_metadata" -> 18 passed, 49 deselected.
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -q tests/scanners/test_gguf_scanner.py -> 66 passed, 1 skipped (jinja2.sandbox unavailable locally).
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -q tests/test_gguf_sbom_integration.py -> 4 skipped on the Python 3.12 allowlist.
  • uv run ruff format modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ -> 419 files left unchanged.
  • uv run ruff check --fix modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ -> pass.
  • uv run mypy modelaudit/ packages/modelaudit-picklescan/src packages/modelaudit-picklescan/tests tests/ -> pass, 474 source files.
  • git diff --check -> clean.
  • PROMPTFOO_DISABLE_TELEMETRY=1 uv run pytest -n auto -m "not slow and not integration" --maxfail=1 -> 17,375 passed, 1,292 skipped, 39 warnings.

Unknown future tensor types, malformed/truncated headers, impossible dimensions, overflowing tensor tables, unsupported encodings, and payload-offset violations remain covered by the GGUF fail-closed controls.

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Breezy!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@mldangelo-oai mldangelo-oai enabled auto-merge (squash) June 11, 2026 06:43
@mldangelo-oai

Copy link
Copy Markdown
Contributor Author

Pinned rank-262 GGUF compatibility QA for f08b2c4868b7a02bc75b014c0fe03c1eff990c51:

  • unsloth/gemma-4-E2B-it-qat-GGUF at immutable revision db01ae3ceeca98487bf3569814f832f5023cd48c
  • Main 8d6c4864... emits 331 S902 unknown-tensor-type findings.
  • Type 30 accounts for 247 findings in mmproj-BF16.gguf plus 23 in MTP/gemma-4-E2B-it-BF16-MTP.gguf.
  • Type 35 accounts for 61 findings in gemma-4-E2B-it-qat-UD-Q2_K_XL.gguf.
  • Those files become inconclusive with gguf_structure_validation_failed.

Please run the exact PR head against this pinned revision and verify both the BF16 type covered by this PR and any still-current type 35 mapping. Unknown future types should remain explicit/inconclusive, but standardized current types should not generate hundreds of false findings.

Full local evidence: modelaudit-hf-scan-swarm/outputs/hf-audit-r262-20260611.md.

@mldangelo-oai mldangelo-oai merged commit 05443d2 into main Jun 11, 2026
29 checks passed
@mldangelo-oai mldangelo-oai deleted the mdangelo/codex/hf-fp-t11-gguf-bf16-type30-20260610 branch June 11, 2026 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant