[minor] add a general FP8ScaleSweepCalibrator and its registry by Fridah-nv · Pull Request #1171 · NVIDIA/Model-Optimizer

Fridah-nv · 2026-04-02T23:14:11Z

What does this PR do?

Type of change: ?

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ✅ / ❌ / N/A
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅ / ❌ / N/A
Did you write any new necessary tests?: ✅ / ❌ / N/A
Did you update Changelog?: ✅ / ❌ / N/A

Additional Information

Summary by CodeRabbit

New Features
- Added FP8 scale sweep calibration option for weight quantizers across standard and Hessian-based calibration pipelines.
- Introduced a public API for registering custom FP8 scale sweep calibrators per backend, enabling extensibility.
Refactor
- Unified calibrator class structure to centralize FP8 scale sweep candidate generation logic.

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

coderabbitai · 2026-04-02T23:14:24Z

📝 Walkthrough

Walkthrough

The changes introduce a new FP8ScaleSweepCalibrator base class for FP8 calibration and extend calibration functions with a registration API allowing custom calibrators to be registered by backend. The NVFP4MSECalibrator is refactored to inherit from the new class.

Changes

Cohort / File(s)	Summary
FP8 Calibrator Hierarchy `modelopt/torch/quantization/calib/mse.py`	Introduced new `FP8ScaleSweepCalibrator` class with FP8 E4M3 candidate generation (126 valid multipliers). Refactored `NVFP4MSECalibrator` to inherit from this new class instead of directly from `MseCalibrator`, centralizing candidate generation logic. Updated `__all__` exports and adjusted docstrings.
Calibration Registration & Selection `modelopt/torch/quantization/model_calib.py`	Added public `register_fp8_sweep_calibrator()` API and private registry (`_FP8_SWEEP_CALIBRATOR_REGISTRY`) for backend-specific FP8-sweep calibrators. Extended `mse_calibrate()` and `local_hessian_calibrate()` to conditionally select calibrators from registry when `fp8_scale_sweep=True` and backend is registered, falling back to `NVFP4MSECalibrator` for NVFP4-static quantizers. Updated module `__all__` exports.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the main changes: introduction of FP8ScaleSweepCalibrator class and its registration API with a registry mechanism.
Docstring Coverage	✅ Passed	Docstring coverage is 87.50% which is sufficient. The required threshold is 80.00%.
Security Anti-Patterns	✅ Passed	No critical security anti-patterns detected: torch.load with weights_only=False, numpy.load with allow_pickle=True, trust_remote_code=True, eval/exec on external input, or nosec comments are absent from the modified files.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fridah/general-fp8-sweep

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-04-02T23:18:37Z

PR Preview Action v1.8.1
🚀 View preview at https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1171/
Built to branch `gh-pages` at 2026-04-02 23:18 UTC. Preview will be ready when the GitHub Pages deployment is complete.

coderabbitai

🧹 Nitpick comments (3)

modelopt/torch/quantization/model_calib.py (2)

63-80: Consider adding overwrite warning and type validation for consistency.

Two observations:

Unlike register_quant_backend (which warns when overwriting an existing backend), this silently overwrites. For consistency and to help users debug accidental overwrites, consider adding a warning.
The type hint is type rather than type[FP8ScaleSweepCalibrator]. Adding a runtime issubclass check would catch incorrect registrations early.

♻️ Suggested improvement

+import warnings
+from .calib import FP8ScaleSweepCalibrator
+
 def register_fp8_sweep_calibrator(backend: str, calibrator_cls: type) -> None:
     """Register a :class:`FP8ScaleSweepCalibrator` subclass for a quantization backend.
     ...
     """
+    if not issubclass(calibrator_cls, FP8ScaleSweepCalibrator):
+        raise TypeError(
+            f"calibrator_cls must be a subclass of FP8ScaleSweepCalibrator, got {calibrator_cls}"
+        )
+    if backend in _FP8_SWEEP_CALIBRATOR_REGISTRY:
+        warnings.warn(f"Overwriting existing FP8 sweep calibrator for backend: {backend}")
     _FP8_SWEEP_CALIBRATOR_REGISTRY[backend] = calibrator_cls

This aligns with the existing register_quant_backend pattern shown in tensor_quantizer.py:82-99.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/model_calib.py` around lines 63 - 80, The
register_fp8_sweep_calibrator function currently overwrites entries silently and
accepts a generic type; change it to mirror register_quant_backend by (1)
validating that calibrator_cls is a subclass of FP8ScaleSweepCalibrator via
issubclass(calibrator_cls, FP8ScaleSweepCalibrator) and raising a TypeError if
not, and (2) emitting a warning (e.g., using warnings.warn) when
_FP8_SWEEP_CALIBRATOR_REGISTRY already contains the backend key before
overwriting; update the function signature/type hint to accept
type[FP8ScaleSweepCalibrator] to reflect the enforced type.

651-682: Code is correct; consider extracting shared calibrator selection logic.

This block mirrors the calibrator selection in mse_calibrate() (lines 351-378). Both paths:

Check is_nvfp4_static vs registry-backed backends
Instantiate the appropriate calibrator class

A small helper (e.g., _create_fp8_sweep_calibrator(...)) could reduce duplication, but this is optional given the clarity of the current implementation.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/model_calib.py` around lines 651 - 682, The
calibrator selection/instantiation logic is duplicated between this block and
mse_calibrate(); extract a helper (e.g., _create_fp8_sweep_calibrator) that
accepts (weight_quantizer, initial_amax, quant_func, error_func,
is_nvfp4_static) and returns the configured calibrator instance; inside the
helper, if is_nvfp4_static return NVFP4MSECalibrator configured with
amax=initial_amax, axis=weight_quantizer._calibrator._axis if present,
global_amax=weight_quantizer.global_amax and quant_func/error_func; otherwise
lookup calibrator_cls = _FP8_SWEEP_CALIBRATOR_REGISTRY[weight_quantizer.backend]
and instantiate with amax, axis, quant_func and error_func; replace the
duplicated instantiation code in both this block and mse_calibrate() to call the
new helper and assign to weight_quantizer._calibrator.

modelopt/torch/quantization/calib/mse.py (1)

174-185: Add a named constant for the FP8 E4M3 maximum value and document inherited unused parameters.

The magic number 448.0 represents the maximum representable value in FP8 E4M3 format and should be defined as a module-level constant for clarity and maintainability, following the pattern used elsewhere in the codebase (e.g., mxfp8_tensor.py).

Additionally, FP8ScaleSweepCalibrator inherits __init__ from MseCalibrator, which accepts step_size, start_multiplier, and stop_multiplier parameters. Since _generate_candidates is overridden in this class, these parameters are silently ignored. Consider either overriding __init__ to explicitly exclude these parameters or adding a note to the class docstring that these inherited parameters are not used.

♻️ Suggested improvement

+_FP8_E4M3_MAX = 448.0  # Maximum representable value in FP8 E4M3 format
+
 class FP8ScaleSweepCalibrator(MseCalibrator):
     """MSE calibrator that sweeps 126 valid FP8 E4M3 candidates of ``initial_amax``.

     Candidate amax values are ``initial_amax * candidate``
+
+    Note:
+        The ``step_size``, ``start_multiplier``, and ``stop_multiplier`` parameters
+        inherited from :class:`MseCalibrator` are not used by this calibrator.
     """

     def _generate_candidates(self, device: torch.device) -> torch.Tensor:
-        """Generate 126 valid FP8 E4M3 scale candidates."""
+        """Generate 126 valid FP8 E4M3 scale candidates (all nonzero finite values / max)."""
         uint8_values = torch.arange(0, 128, dtype=torch.uint8, device=device)
         fp8_values = uint8_values.view(torch.float8_e4m3fn).float()
         valid_mask = torch.isfinite(fp8_values) & (fp8_values > 0)
-        return fp8_values[valid_mask] / 448.0
+        return fp8_values[valid_mask] / _FP8_E4M3_MAX

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/calib/mse.py` around lines 174 - 185, Replace the
magic literal 448.0 with a module-level constant named FP8_E4M3_MAX (or similar)
and use that constant in FP8ScaleSweepCalibrator._generate_candidates so the
maximum representable FP8 E4M3 value is documented and reusable (matching the
pattern in mxfp8_tensor.py); also update the FP8ScaleSweepCalibrator class
docstring to note that the inherited MseCalibrator __init__ parameters
(step_size, start_multiplier, stop_multiplier) are intentionally unused by this
subclass, or alternatively override __init__ in FP8ScaleSweepCalibrator to
explicitly accept and ignore those params so their omission is explicit.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@modelopt/torch/quantization/calib/mse.py`:
- Around line 174-185: Replace the magic literal 448.0 with a module-level
constant named FP8_E4M3_MAX (or similar) and use that constant in
FP8ScaleSweepCalibrator._generate_candidates so the maximum representable FP8
E4M3 value is documented and reusable (matching the pattern in mxfp8_tensor.py);
also update the FP8ScaleSweepCalibrator class docstring to note that the
inherited MseCalibrator __init__ parameters (step_size, start_multiplier,
stop_multiplier) are intentionally unused by this subclass, or alternatively
override __init__ in FP8ScaleSweepCalibrator to explicitly accept and ignore
those params so their omission is explicit.

In `@modelopt/torch/quantization/model_calib.py`:
- Around line 63-80: The register_fp8_sweep_calibrator function currently
overwrites entries silently and accepts a generic type; change it to mirror
register_quant_backend by (1) validating that calibrator_cls is a subclass of
FP8ScaleSweepCalibrator via issubclass(calibrator_cls, FP8ScaleSweepCalibrator)
and raising a TypeError if not, and (2) emitting a warning (e.g., using
warnings.warn) when _FP8_SWEEP_CALIBRATOR_REGISTRY already contains the backend
key before overwriting; update the function signature/type hint to accept
type[FP8ScaleSweepCalibrator] to reflect the enforced type.
- Around line 651-682: The calibrator selection/instantiation logic is
duplicated between this block and mse_calibrate(); extract a helper (e.g.,
_create_fp8_sweep_calibrator) that accepts (weight_quantizer, initial_amax,
quant_func, error_func, is_nvfp4_static) and returns the configured calibrator
instance; inside the helper, if is_nvfp4_static return NVFP4MSECalibrator
configured with amax=initial_amax, axis=weight_quantizer._calibrator._axis if
present, global_amax=weight_quantizer.global_amax and quant_func/error_func;
otherwise lookup calibrator_cls =
_FP8_SWEEP_CALIBRATOR_REGISTRY[weight_quantizer.backend] and instantiate with
amax, axis, quant_func and error_func; replace the duplicated instantiation code
in both this block and mse_calibrate() to call the new helper and assign to
weight_quantizer._calibrator.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6a31f4b3-e4ad-4edc-b8f6-48c7553b2e41

📥 Commits

Reviewing files that changed from the base of the PR and between 18ddcb7 and 7034789.

📒 Files selected for processing (2)

modelopt/torch/quantization/calib/mse.py
modelopt/torch/quantization/model_calib.py

codecov · 2026-04-02T23:27:43Z

Codecov Report

❌ Patch coverage is 33.33333% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 54.51%. Comparing base (c37c74f) to head (7034789).
⚠️ Report is 11 commits behind head on main.

Files with missing lines	Patch %	Lines
modelopt/torch/quantization/model_calib.py	26.66%	11 Missing ⚠️
modelopt/torch/quantization/calib/mse.py	44.44%	5 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (c37c74f) and HEAD (7034789). Click for more details.

HEAD has 1 upload less than BASE

Flag BASE (c37c74f) HEAD (7034789)

1 0

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #1171       +/-   ##
===========================================
- Coverage   70.20%   54.51%   -15.69%     
===========================================
  Files         230      348      +118     
  Lines       26098    39789    +13691     
===========================================
+ Hits        18322    21692     +3370     
- Misses       7776    18097    +10321

Flag	Coverage Δ
unit	`54.51% <33.33%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

add a general FP8ScaleSweepCalibrator and its registry

7034789

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

Fridah-nv requested review from realAsma and sugunav14 April 2, 2026 23:14

Fridah-nv self-assigned this Apr 2, 2026

Fridah-nv requested a review from a team as a code owner April 2, 2026 23:14

coderabbitai bot reviewed Apr 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[minor] add a general FP8ScaleSweepCalibrator and its registry#1171

[minor] add a general FP8ScaleSweepCalibrator and its registry#1171
Fridah-nv wants to merge 1 commit intomainfrom
fridah/general-fp8-sweep

Fridah-nv commented Apr 2, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 2, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

github-actions bot commented Apr 2, 2026

Built to branch `gh-pages` at 2026-04-02 23:18 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

coderabbitai bot left a comment

Uh oh!

codecov bot commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Fridah-nv commented Apr 2, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

github-actions bot commented Apr 2, 2026

Built to branch gh-pages at 2026-04-02 23:18 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Apr 2, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fridah-nv commented Apr 2, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 2, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-04-02 23:18 UTC.
Preview will be ready when the GitHub Pages deployment is complete.