Added support to rotate in fp32 (optional) by kinjalpatel27 · Pull Request #885 · NVIDIA/Model-Optimizer

kinjalpatel27 · 2026-02-12T22:57:54Z

What does this PR do?

Type of change: New Feature

Overview:
This MR adds support to perform rotation for RHT in float32 if enabled by quantization configuration. It also makes rotate argument in quantization configuration of type bool (for backward compatibility) or dict (added option for float32 rotation)

Usage

python hf_ptq.py --pyt_ckpt_path meta-llama/Llama-3.2-3B-Instruct --qformat nvfp4 --export_fmt hf --dataset cnn_dailymail --export_path test --trust_remote_code --inference_pipeline_parallel 1 --batch_size 1 --calib_size 4 --kv_cache_qformat nvfp4_rotate

Updated NVFP4_KV_ROTATE_CFG locally with "rotate": {"enable": True, "rotate_fp32": True}

...
model.layers.27.self_attn.k_bmm_quantizer                                        TensorQuantizer((2, 1) bit fake block_sizes={-1: 16, 'type': 'dynamic', 'scale_bits': (4, 3)}, amax=8.3750 rotated (fp32) calibrator
=MaxCalibrator quant) 
...

Updated NVFP4_KV_ROTATE_CFG locally with "rotate": {"enable": True, "rotate_fp32": False}

model.layers.27.self_attn.k_bmm_quantizer                                        TensorQuantizer((2, 1) bit fake block_sizes={-1: 16, 'type': 'dynamic', 'scale_bits': (4, 3)}, amax=8.3750 rotated calibrator=MaxCalibrator quant)

Testing

Updated unit test in tests/gpu/torch/quantization/test_hadamard.py

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes
Did you write any new necessary tests?: No (updated existing test)
Did you add or update any necessary documentation?: Yes
Did you update Changelog?: Yes

Additional Information

Summary by CodeRabbit

Release Notes

New Features
- Added rotational input capability prior to quantization for RHT (Rotated Hyperplane Transform).
- Introduced granular rotation configuration options enabling FP32 casting for improved numerical stability during transforms.
Tests
- Expanded test coverage for rotation functionality with parameterized FP32 casting scenarios.

coderabbitai · 2026-02-12T22:58:10Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

This change introduces granular rotation configuration for the quantization preprocessing pipeline. The rotate parameter now accepts either a boolean or a dictionary with "enable" and "rotate_fp32" keys, enabling independent control of rotation functionality and floating-point precision casting during Hadamard transforms.

Changes

Cohort / File(s)	Summary
Configuration Updates `CHANGELOG.rst`, `modelopt/torch/quantization/config.py`	Updated rotate field in QuantizerAttributeConfig to support `bool \| dict[str, bool]` type, with documentation for dict keys "enable" and "rotate_fp32". Changelog entry documents RHT rotational input capability.
Hadamard Transform Implementation `modelopt/torch/quantization/nn/functional.py`, `modelopt/torch/quantization/nn/modules/tensor_quantizer.py`	Added optional `rotate_fp32` parameter to `normalized_hadamard_transform()` for conditional FP32 casting during transforms. Updated tensor_quantizer to extract rotation settings from dict-based config and pass rotate_fp32 to the transform function.
Test Coverage `tests/gpu/torch/quantization/test_hadamard.py`	Parameterized `test_kv_rotate()` to validate both `rotate_fp32=True` and `rotate_fp32=False` execution paths with tolerance adjustments per path.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~15 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 71.43% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Added support to rotate in fp32 (optional)' directly reflects the main change in the changeset: extending rotation support to include optional fp32 mode during quantization preprocessing.
Merge Conflict Detection	✅ Passed	✅ No merge conflicts detected when merging into `main`

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch kinjal/rotate_float

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

modelopt/torch/quantization/nn/modules/tensor_quantizer.py

codecov · 2026-02-12T23:08:58Z

Codecov Report

❌ Patch coverage is 64.70588% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.73%. Comparing base (95511a0) to head (9920a1e).
⚠️ Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
modelopt/torch/quantization/nn/functional.py	16.66%	5 Missing ⚠️
.../torch/quantization/nn/modules/tensor_quantizer.py	90.00%	1 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #885   +/-   ##
=======================================
  Coverage   73.73%   73.73%           
=======================================
  Files         199      199           
  Lines       21165    21174    +9     
=======================================
+ Hits        15606    15613    +7     
- Misses       5559     5561    +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Kinjal Patel <[email protected]>

cjluo-nv · 2026-02-14T08:14:31Z

CHANGELOG.rst

 - Add PTQ support for GLM-4.7, including loading MTP layer weights from a separate ``mtp.safetensors`` file and export as-is.
 - Add support for image-text data calibration in PTQ for Nemotron VL models.
 - Add PTQ support for Nemotron Parse.
+- Add support for rotating the input before quantization for RHT.


is this a 0.42 feature or we should create 0.43?

modelopt-bot

Code Review: PR #885 - Added support to rotate in fp32 (optional)

Overview

This PR adds an optional FP32 precision mode for the Hadamard rotation transform used in RHT (Rotated Hyperplane Transform) quantization methods like QuaRot/SpinQuant. The change is backward compatible.

✅ Strengths

Aspect	Assessment
Backward Compatibility	Excellent. The `rotate` config now accepts `bool \| dict[str, bool]` — existing boolean configs continue to work unchanged.
Code Organization	Clean property abstraction in `tensor_quantizer.py` — `rotate_is_enabled` and `rotate_is_fp32` handle the type polymorphism well.
Test Coverage	`test_kv_rotate` is now parametrized for both `rotate=True` and `rotate={"enable": True, "rotate_fp32": True}`. Also added FP32 test in `test_hadamard_transform`.
Documentation	Changelog updated, config docstrings are clear, and the PR description includes usage examples.
Typo Fix	Caught and fixed "ratation" → "rotation" in the config description.

⚠️ Points to Consider

1. Numerical Precision in Mixed Scenarios
In functional.py, the sqrt calculation uses torch.float32 tensor, but if rotate_fp32=False and input is FP16/BF16, the division happens in the input's dtype. This is existing behavior, but worth noting that the denominator's precision differs based on mode.

2. Property Logic in rotate_is_fp32

return (
    self._rotate.get("rotate_fp32", False)
    if isinstance(self._rotate, dict) and self.rotate_is_enabled
    else False
)

The check self.rotate_is_enabled inside rotate_is_fp32 is redundant since you already check isinstance(self._rotate, dict). If it's a dict with "enable": False, returning False either way is fine, but consider simplifying to just check isinstance.

3. Test Assertions
Both FP32 and non-FP32 tests use the same atol=0.05. This is reasonable given the comment about 16-bit float errors, but consider if tighter tolerance is expected for FP32 mode.

📝 Minor Nits

The CHANGELOG.rst entry is brief. Consider slightly more detail: "Add support for FP32 rotation computation in RHT quantization (QuaRot/SpinQuant) when specified in rotation config."

Verdict

LGTM — This is a solid, well-tested feature addition with proper backward compatibility. The code is clean and the documentation is thorough. Ready to merge after addressing any optional nits.

modelopt-bot

See detailed feedback in inline comments below.

modelopt-bot · 2026-02-14T08:25:46Z

modelopt/torch/quantization/nn/modules/tensor_quantizer.py

+        """Check if rotate is enabled in quant config."""
+        return self._rotate.get("enable", False) if isinstance(self._rotate, dict) else self._rotate
+
+    @property


Nit: The check inside this property is redundant. If is True and the dict has , then will correctly return False anyway. Consider simplifying to just .

modelopt-bot · 2026-02-14T08:25:46Z

modelopt/torch/quantization/nn/functional.py

        )

-    return FastHadamardTransform.apply(inputs) / torch.sqrt(
+    dtype = inputs.dtype


Note: When and input is FP16/BF16, the uses but the division happens in the input dtype. This is existing behavior, but there's a subtle precision difference: in FP32 mode, the division is in FP32; otherwise it's in input dtype. worth documenting if this is intentional.

modelopt-bot · 2026-02-14T08:25:46Z

tests/gpu/torch/quantization/test_hadamard.py

    xxt_h = x_h @ x_h.T
    # The numerical error can be large, especially for 16-bit floats.
    assert torch.allclose(xxt_h, xxt, atol=0.05)
+    x_h_fp32 = normalized_hadamard_transform(x, rotate_fp32=True)


Consider if a tighter should be used for FP32 mode, since it should have better numerical precision than 16-bit floats.

kinjalpatel27 requested a review from a team as a code owner February 12, 2026 22:57

kinjalpatel27 requested a review from cjluo-nv February 12, 2026 22:57

realAsma reviewed Feb 12, 2026

View reviewed changes

modelopt/torch/quantization/nn/modules/tensor_quantizer.py Outdated Show resolved Hide resolved

realAsma approved these changes Feb 12, 2026

View reviewed changes

kinjalpatel27 added 5 commits February 13, 2026 18:50

added support for rotate in fp32

188b74d

Signed-off-by: Kinjal Patel <[email protected]>

updated changelog

fc7e476

Signed-off-by: Kinjal Patel <[email protected]>

minor

4e23527

Signed-off-by: Kinjal Patel <[email protected]>

minor

39b8b32

Signed-off-by: Kinjal Patel <[email protected]>

minor

044a4ab

Signed-off-by: Kinjal Patel <[email protected]>

kinjalpatel27 force-pushed the kinjal/rotate_float branch from 57f1766 to 044a4ab Compare February 13, 2026 18:52

minor

9920a1e

Signed-off-by: Kinjal Patel <[email protected]>

cjluo-nv reviewed Feb 14, 2026

View reviewed changes

modelopt-bot reviewed Feb 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added support to rotate in fp32 (optional)#885

Added support to rotate in fp32 (optional)#885
kinjalpatel27 wants to merge 6 commits intomainfrom
kinjal/rotate_float

kinjalpatel27 commented Feb 12, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 12, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Estimated code review effort

Uh oh!

Uh oh!

codecov bot commented Feb 12, 2026 •

edited

Loading

Uh oh!

cjluo-nv Feb 14, 2026

Uh oh!

modelopt-bot left a comment

Uh oh!

modelopt-bot left a comment

Uh oh!

modelopt-bot Feb 14, 2026

Uh oh!

modelopt-bot Feb 14, 2026

Uh oh!

modelopt-bot Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

kinjalpatel27 commented Feb 12, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Uh oh!

Uh oh!

codecov bot commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cjluo-nv Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

modelopt-bot left a comment

Choose a reason for hiding this comment

Code Review: PR #885 - Added support to rotate in fp32 (optional)

Overview

✅ Strengths

⚠️ Points to Consider

📝 Minor Nits

Verdict

Uh oh!

modelopt-bot left a comment

Choose a reason for hiding this comment

Uh oh!

modelopt-bot Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

modelopt-bot Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

modelopt-bot Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kinjalpatel27 commented Feb 12, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 12, 2026 •

edited

Loading

codecov bot commented Feb 12, 2026 •

edited

Loading