Skip to content

Fix 4-bit quantization for weight matrices not divisible by blocksize#1884

Merged
matthewdouglas merged 3 commits intobitsandbytes-foundation:mainfrom
Abdennacer-Badaoui:triton-enhancements
Mar 3, 2026
Merged

Fix 4-bit quantization for weight matrices not divisible by blocksize#1884
matthewdouglas merged 3 commits intobitsandbytes-foundation:mainfrom
Abdennacer-Badaoui:triton-enhancements

Conversation

@Abdennacer-Badaoui
Copy link
Member

The Triton 4-bit quantization kernels assume the total number of elements is evenly divisible by blocksize. When this doesn't hold, the last block is partially filled with uninitialized data, which corrupts the absmax scaling factor for that block.

This PR fixes the issue by padding the input tensor with zeros to the next multiple of blocksize before entering the kernel. The padding is purely internal to quantize_4bit ; the output tensor and dequantization path use the original shape, so callers are unaffected. Zero-padding has no impact on absmax accuracy since max(abs(...)) is unaffected by additional zeros.

A roundtrip test is added to verify quantize/dequantize works correctly with non-divisible shapes and produces no NaN or Inf values.

@github-actions
Copy link

github-actions bot commented Mar 3, 2026

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@matthewdouglas matthewdouglas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@matthewdouglas matthewdouglas merged commit e63e29c into bitsandbytes-foundation:main Mar 3, 2026
91 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants