Added support to rotate in fp32 (optional)#885
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis change introduces granular rotation configuration for the quantization preprocessing pipeline. The rotate parameter now accepts either a boolean or a dictionary with "enable" and "rotate_fp32" keys, enabling independent control of rotation functionality and floating-point precision casting during Hadamard transforms. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~15 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #885 +/- ##
=======================================
Coverage 73.73% 73.73%
=======================================
Files 199 199
Lines 21165 21174 +9
=======================================
+ Hits 15606 15613 +7
- Misses 5559 5561 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Kinjal Patel <[email protected]>
Signed-off-by: Kinjal Patel <[email protected]>
Signed-off-by: Kinjal Patel <[email protected]>
Signed-off-by: Kinjal Patel <[email protected]>
Signed-off-by: Kinjal Patel <[email protected]>
57f1766 to
044a4ab
Compare
Signed-off-by: Kinjal Patel <[email protected]>
| - Add PTQ support for GLM-4.7, including loading MTP layer weights from a separate ``mtp.safetensors`` file and export as-is. | ||
| - Add support for image-text data calibration in PTQ for Nemotron VL models. | ||
| - Add PTQ support for Nemotron Parse. | ||
| - Add support for rotating the input before quantization for RHT. |
There was a problem hiding this comment.
is this a 0.42 feature or we should create 0.43?
modelopt-bot
left a comment
There was a problem hiding this comment.
Code Review: PR #885 - Added support to rotate in fp32 (optional)
Overview
This PR adds an optional FP32 precision mode for the Hadamard rotation transform used in RHT (Rotated Hyperplane Transform) quantization methods like QuaRot/SpinQuant. The change is backward compatible.
✅ Strengths
| Aspect | Assessment |
|---|---|
| Backward Compatibility | Excellent. The rotate config now accepts bool | dict[str, bool] — existing boolean configs continue to work unchanged. |
| Code Organization | Clean property abstraction in tensor_quantizer.py — rotate_is_enabled and rotate_is_fp32 handle the type polymorphism well. |
| Test Coverage | test_kv_rotate is now parametrized for both rotate=True and rotate={"enable": True, "rotate_fp32": True}. Also added FP32 test in test_hadamard_transform. |
| Documentation | Changelog updated, config docstrings are clear, and the PR description includes usage examples. |
| Typo Fix | Caught and fixed "ratation" → "rotation" in the config description. |
⚠️ Points to Consider
1. Numerical Precision in Mixed Scenarios
In functional.py, the sqrt calculation uses torch.float32 tensor, but if rotate_fp32=False and input is FP16/BF16, the division happens in the input's dtype. This is existing behavior, but worth noting that the denominator's precision differs based on mode.
2. Property Logic in rotate_is_fp32
return (
self._rotate.get("rotate_fp32", False)
if isinstance(self._rotate, dict) and self.rotate_is_enabled
else False
)The check self.rotate_is_enabled inside rotate_is_fp32 is redundant since you already check isinstance(self._rotate, dict). If it's a dict with "enable": False, returning False either way is fine, but consider simplifying to just check isinstance.
3. Test Assertions
Both FP32 and non-FP32 tests use the same atol=0.05. This is reasonable given the comment about 16-bit float errors, but consider if tighter tolerance is expected for FP32 mode.
📝 Minor Nits
- The
CHANGELOG.rstentry is brief. Consider slightly more detail: "Add support for FP32 rotation computation in RHT quantization (QuaRot/SpinQuant) when specified in rotation config."
Verdict
LGTM — This is a solid, well-tested feature addition with proper backward compatibility. The code is clean and the documentation is thorough. Ready to merge after addressing any optional nits.
modelopt-bot
left a comment
There was a problem hiding this comment.
See detailed feedback in inline comments below.
| """Check if rotate is enabled in quant config.""" | ||
| return self._rotate.get("enable", False) if isinstance(self._rotate, dict) else self._rotate | ||
|
|
||
| @property |
There was a problem hiding this comment.
Nit: The check inside this property is redundant. If is True and the dict has , then will correctly return False anyway. Consider simplifying to just .
| ) | ||
|
|
||
| return FastHadamardTransform.apply(inputs) / torch.sqrt( | ||
| dtype = inputs.dtype |
There was a problem hiding this comment.
Note: When and input is FP16/BF16, the uses but the division happens in the input dtype. This is existing behavior, but there's a subtle precision difference: in FP32 mode, the division is in FP32; otherwise it's in input dtype. worth documenting if this is intentional.
| xxt_h = x_h @ x_h.T | ||
| # The numerical error can be large, especially for 16-bit floats. | ||
| assert torch.allclose(xxt_h, xxt, atol=0.05) | ||
| x_h_fp32 = normalized_hadamard_transform(x, rotate_fp32=True) |
There was a problem hiding this comment.
Consider if a tighter should be used for FP32 mode, since it should have better numerical precision than 16-bit floats.
What does this PR do?
Type of change: New Feature
Overview:
This MR adds support to perform rotation for RHT in float32 if enabled by quantization configuration. It also makes rotate argument in quantization configuration of type bool (for backward compatibility) or dict (added option for float32 rotation)
Usage
Updated
NVFP4_KV_ROTATE_CFGlocally with"rotate": {"enable": True, "rotate_fp32": True}Updated
NVFP4_KV_ROTATE_CFGlocally with"rotate": {"enable": True, "rotate_fp32": False}Testing
Updated unit test in
tests/gpu/torch/quantization/test_hadamard.pyBefore your PR is "Ready for review"
Additional Information
Summary by CodeRabbit
Release Notes
New Features
Tests