Skip to content

Added support to rotate in fp32 (optional)#885

Open
kinjalpatel27 wants to merge 6 commits intomainfrom
kinjal/rotate_float
Open

Added support to rotate in fp32 (optional)#885
kinjalpatel27 wants to merge 6 commits intomainfrom
kinjal/rotate_float

Conversation

@kinjalpatel27
Copy link
Contributor

@kinjalpatel27 kinjalpatel27 commented Feb 12, 2026

What does this PR do?

Type of change: New Feature

Overview:
This MR adds support to perform rotation for RHT in float32 if enabled by quantization configuration. It also makes rotate argument in quantization configuration of type bool (for backward compatibility) or dict (added option for float32 rotation)

Usage

python hf_ptq.py --pyt_ckpt_path meta-llama/Llama-3.2-3B-Instruct --qformat nvfp4 --export_fmt hf --dataset cnn_dailymail --export_path test --trust_remote_code --inference_pipeline_parallel 1 --batch_size 1 --calib_size 4 --kv_cache_qformat nvfp4_rotate

Updated NVFP4_KV_ROTATE_CFG locally with "rotate": {"enable": True, "rotate_fp32": True}

...
model.layers.27.self_attn.k_bmm_quantizer                                        TensorQuantizer((2, 1) bit fake block_sizes={-1: 16, 'type': 'dynamic', 'scale_bits': (4, 3)}, amax=8.3750 rotated (fp32) calibrator
=MaxCalibrator quant) 
...

Updated NVFP4_KV_ROTATE_CFG locally with "rotate": {"enable": True, "rotate_fp32": False}

model.layers.27.self_attn.k_bmm_quantizer                                        TensorQuantizer((2, 1) bit fake block_sizes={-1: 16, 'type': 'dynamic', 'scale_bits': (4, 3)}, amax=8.3750 rotated calibrator=MaxCalibrator quant)

Testing

Updated unit test in tests/gpu/torch/quantization/test_hadamard.py

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes
  • Did you write any new necessary tests?: No (updated existing test)
  • Did you add or update any necessary documentation?: Yes
  • Did you update Changelog?: Yes

Additional Information

Summary by CodeRabbit

Release Notes

  • New Features

    • Added rotational input capability prior to quantization for RHT (Rotated Hyperplane Transform).
    • Introduced granular rotation configuration options enabling FP32 casting for improved numerical stability during transforms.
  • Tests

    • Expanded test coverage for rotation functionality with parameterized FP32 casting scenarios.

@kinjalpatel27 kinjalpatel27 requested a review from a team as a code owner February 12, 2026 22:57
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 12, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This change introduces granular rotation configuration for the quantization preprocessing pipeline. The rotate parameter now accepts either a boolean or a dictionary with "enable" and "rotate_fp32" keys, enabling independent control of rotation functionality and floating-point precision casting during Hadamard transforms.

Changes

Cohort / File(s) Summary
Configuration Updates
CHANGELOG.rst, modelopt/torch/quantization/config.py
Updated rotate field in QuantizerAttributeConfig to support bool | dict[str, bool] type, with documentation for dict keys "enable" and "rotate_fp32". Changelog entry documents RHT rotational input capability.
Hadamard Transform Implementation
modelopt/torch/quantization/nn/functional.py, modelopt/torch/quantization/nn/modules/tensor_quantizer.py
Added optional rotate_fp32 parameter to normalized_hadamard_transform() for conditional FP32 casting during transforms. Updated tensor_quantizer to extract rotation settings from dict-based config and pass rotate_fp32 to the transform function.
Test Coverage
tests/gpu/torch/quantization/test_hadamard.py
Parameterized test_kv_rotate() to validate both rotate_fp32=True and rotate_fp32=False execution paths with tolerance adjustments per path.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~15 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 71.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Added support to rotate in fp32 (optional)' directly reflects the main change in the changeset: extending rotation support to include optional fp32 mode during quantization preprocessing.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into main

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch kinjal/rotate_float

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link

codecov bot commented Feb 12, 2026

Codecov Report

❌ Patch coverage is 64.70588% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.73%. Comparing base (95511a0) to head (9920a1e).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
modelopt/torch/quantization/nn/functional.py 16.66% 5 Missing ⚠️
.../torch/quantization/nn/modules/tensor_quantizer.py 90.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #885   +/-   ##
=======================================
  Coverage   73.73%   73.73%           
=======================================
  Files         199      199           
  Lines       21165    21174    +9     
=======================================
+ Hits        15606    15613    +7     
- Misses       5559     5561    +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Kinjal Patel <[email protected]>
Signed-off-by: Kinjal Patel <[email protected]>
Signed-off-by: Kinjal Patel <[email protected]>
Signed-off-by: Kinjal Patel <[email protected]>
Signed-off-by: Kinjal Patel <[email protected]>
- Add PTQ support for GLM-4.7, including loading MTP layer weights from a separate ``mtp.safetensors`` file and export as-is.
- Add support for image-text data calibration in PTQ for Nemotron VL models.
- Add PTQ support for Nemotron Parse.
- Add support for rotating the input before quantization for RHT.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a 0.42 feature or we should create 0.43?

Copy link

@modelopt-bot modelopt-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: PR #885 - Added support to rotate in fp32 (optional)

Overview

This PR adds an optional FP32 precision mode for the Hadamard rotation transform used in RHT (Rotated Hyperplane Transform) quantization methods like QuaRot/SpinQuant. The change is backward compatible.


✅ Strengths

Aspect Assessment
Backward Compatibility Excellent. The rotate config now accepts bool | dict[str, bool] — existing boolean configs continue to work unchanged.
Code Organization Clean property abstraction in tensor_quantizer.pyrotate_is_enabled and rotate_is_fp32 handle the type polymorphism well.
Test Coverage test_kv_rotate is now parametrized for both rotate=True and rotate={"enable": True, "rotate_fp32": True}. Also added FP32 test in test_hadamard_transform.
Documentation Changelog updated, config docstrings are clear, and the PR description includes usage examples.
Typo Fix Caught and fixed "ratation" → "rotation" in the config description.

⚠️ Points to Consider

1. Numerical Precision in Mixed Scenarios
In functional.py, the sqrt calculation uses torch.float32 tensor, but if rotate_fp32=False and input is FP16/BF16, the division happens in the input's dtype. This is existing behavior, but worth noting that the denominator's precision differs based on mode.

2. Property Logic in rotate_is_fp32

return (
    self._rotate.get("rotate_fp32", False)
    if isinstance(self._rotate, dict) and self.rotate_is_enabled
    else False
)

The check self.rotate_is_enabled inside rotate_is_fp32 is redundant since you already check isinstance(self._rotate, dict). If it's a dict with "enable": False, returning False either way is fine, but consider simplifying to just check isinstance.

3. Test Assertions
Both FP32 and non-FP32 tests use the same atol=0.05. This is reasonable given the comment about 16-bit float errors, but consider if tighter tolerance is expected for FP32 mode.


📝 Minor Nits

  • The CHANGELOG.rst entry is brief. Consider slightly more detail: "Add support for FP32 rotation computation in RHT quantization (QuaRot/SpinQuant) when specified in rotation config."

Verdict

LGTM — This is a solid, well-tested feature addition with proper backward compatibility. The code is clean and the documentation is thorough. Ready to merge after addressing any optional nits.

Copy link

@modelopt-bot modelopt-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See detailed feedback in inline comments below.

"""Check if rotate is enabled in quant config."""
return self._rotate.get("enable", False) if isinstance(self._rotate, dict) else self._rotate

@property

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: The check inside this property is redundant. If is True and the dict has , then will correctly return False anyway. Consider simplifying to just .

)

return FastHadamardTransform.apply(inputs) / torch.sqrt(
dtype = inputs.dtype

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: When and input is FP16/BF16, the uses but the division happens in the input dtype. This is existing behavior, but there's a subtle precision difference: in FP32 mode, the division is in FP32; otherwise it's in input dtype. worth documenting if this is intentional.

xxt_h = x_h @ x_h.T
# The numerical error can be large, especially for 16-bit floats.
assert torch.allclose(xxt_h, xxt, atol=0.05)
x_h_fp32 = normalized_hadamard_transform(x, rotate_fp32=True)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider if a tighter should be used for FP32 mode, since it should have better numerical precision than 16-bit floats.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants