Skip to content

[minor] add a general FP8ScaleSweepCalibrator and its registry#1171

Open
Fridah-nv wants to merge 1 commit intomainfrom
fridah/general-fp8-sweep
Open

[minor] add a general FP8ScaleSweepCalibrator and its registry#1171
Fridah-nv wants to merge 1 commit intomainfrom
fridah/general-fp8-sweep

Conversation

@Fridah-nv
Copy link
Copy Markdown
Contributor

@Fridah-nv Fridah-nv commented Apr 2, 2026

What does this PR do?

Type of change: ?

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

  • Is this change backward compatible?: ✅ / ❌ / N/A
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅ / ❌ / N/A
  • Did you write any new necessary tests?: ✅ / ❌ / N/A
  • Did you update Changelog?: ✅ / ❌ / N/A

Additional Information

Summary by CodeRabbit

  • New Features

    • Added FP8 scale sweep calibration option for weight quantizers across standard and Hessian-based calibration pipelines.
    • Introduced a public API for registering custom FP8 scale sweep calibrators per backend, enabling extensibility.
  • Refactor

    • Unified calibrator class structure to centralize FP8 scale sweep candidate generation logic.

Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
@Fridah-nv Fridah-nv requested review from realAsma and sugunav14 April 2, 2026 23:14
@Fridah-nv Fridah-nv self-assigned this Apr 2, 2026
@Fridah-nv Fridah-nv requested a review from a team as a code owner April 2, 2026 23:14
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 2, 2026

📝 Walkthrough

Walkthrough

The changes introduce a new FP8ScaleSweepCalibrator base class for FP8 calibration and extend calibration functions with a registration API allowing custom calibrators to be registered by backend. The NVFP4MSECalibrator is refactored to inherit from the new class.

Changes

Cohort / File(s) Summary
FP8 Calibrator Hierarchy
modelopt/torch/quantization/calib/mse.py
Introduced new FP8ScaleSweepCalibrator class with FP8 E4M3 candidate generation (126 valid multipliers). Refactored NVFP4MSECalibrator to inherit from this new class instead of directly from MseCalibrator, centralizing candidate generation logic. Updated __all__ exports and adjusted docstrings.
Calibration Registration & Selection
modelopt/torch/quantization/model_calib.py
Added public register_fp8_sweep_calibrator() API and private registry (_FP8_SWEEP_CALIBRATOR_REGISTRY) for backend-specific FP8-sweep calibrators. Extended mse_calibrate() and local_hessian_calibrate() to conditionally select calibrators from registry when fp8_scale_sweep=True and backend is registered, falling back to NVFP4MSECalibrator for NVFP4-static quantizers. Updated module __all__ exports.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main changes: introduction of FP8ScaleSweepCalibrator class and its registration API with a registry mechanism.
Docstring Coverage ✅ Passed Docstring coverage is 87.50% which is sufficient. The required threshold is 80.00%.
Security Anti-Patterns ✅ Passed No critical security anti-patterns detected: torch.load with weights_only=False, numpy.load with allow_pickle=True, trust_remote_code=True, eval/exec on external input, or nosec comments are absent from the modified files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fridah/general-fp8-sweep

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2026

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1171/

Built to branch gh-pages at 2026-04-02 23:18 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
modelopt/torch/quantization/model_calib.py (2)

63-80: Consider adding overwrite warning and type validation for consistency.

Two observations:

  1. Unlike register_quant_backend (which warns when overwriting an existing backend), this silently overwrites. For consistency and to help users debug accidental overwrites, consider adding a warning.
  2. The type hint is type rather than type[FP8ScaleSweepCalibrator]. Adding a runtime issubclass check would catch incorrect registrations early.
♻️ Suggested improvement
+import warnings
+from .calib import FP8ScaleSweepCalibrator
+
 def register_fp8_sweep_calibrator(backend: str, calibrator_cls: type) -> None:
     """Register a :class:`FP8ScaleSweepCalibrator` subclass for a quantization backend.
     ...
     """
+    if not issubclass(calibrator_cls, FP8ScaleSweepCalibrator):
+        raise TypeError(
+            f"calibrator_cls must be a subclass of FP8ScaleSweepCalibrator, got {calibrator_cls}"
+        )
+    if backend in _FP8_SWEEP_CALIBRATOR_REGISTRY:
+        warnings.warn(f"Overwriting existing FP8 sweep calibrator for backend: {backend}")
     _FP8_SWEEP_CALIBRATOR_REGISTRY[backend] = calibrator_cls

This aligns with the existing register_quant_backend pattern shown in tensor_quantizer.py:82-99.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/model_calib.py` around lines 63 - 80, The
register_fp8_sweep_calibrator function currently overwrites entries silently and
accepts a generic type; change it to mirror register_quant_backend by (1)
validating that calibrator_cls is a subclass of FP8ScaleSweepCalibrator via
issubclass(calibrator_cls, FP8ScaleSweepCalibrator) and raising a TypeError if
not, and (2) emitting a warning (e.g., using warnings.warn) when
_FP8_SWEEP_CALIBRATOR_REGISTRY already contains the backend key before
overwriting; update the function signature/type hint to accept
type[FP8ScaleSweepCalibrator] to reflect the enforced type.

651-682: Code is correct; consider extracting shared calibrator selection logic.

This block mirrors the calibrator selection in mse_calibrate() (lines 351-378). Both paths:

  1. Check is_nvfp4_static vs registry-backed backends
  2. Instantiate the appropriate calibrator class

A small helper (e.g., _create_fp8_sweep_calibrator(...)) could reduce duplication, but this is optional given the clarity of the current implementation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/model_calib.py` around lines 651 - 682, The
calibrator selection/instantiation logic is duplicated between this block and
mse_calibrate(); extract a helper (e.g., _create_fp8_sweep_calibrator) that
accepts (weight_quantizer, initial_amax, quant_func, error_func,
is_nvfp4_static) and returns the configured calibrator instance; inside the
helper, if is_nvfp4_static return NVFP4MSECalibrator configured with
amax=initial_amax, axis=weight_quantizer._calibrator._axis if present,
global_amax=weight_quantizer.global_amax and quant_func/error_func; otherwise
lookup calibrator_cls = _FP8_SWEEP_CALIBRATOR_REGISTRY[weight_quantizer.backend]
and instantiate with amax, axis, quant_func and error_func; replace the
duplicated instantiation code in both this block and mse_calibrate() to call the
new helper and assign to weight_quantizer._calibrator.
modelopt/torch/quantization/calib/mse.py (1)

174-185: Add a named constant for the FP8 E4M3 maximum value and document inherited unused parameters.

The magic number 448.0 represents the maximum representable value in FP8 E4M3 format and should be defined as a module-level constant for clarity and maintainability, following the pattern used elsewhere in the codebase (e.g., mxfp8_tensor.py).

Additionally, FP8ScaleSweepCalibrator inherits __init__ from MseCalibrator, which accepts step_size, start_multiplier, and stop_multiplier parameters. Since _generate_candidates is overridden in this class, these parameters are silently ignored. Consider either overriding __init__ to explicitly exclude these parameters or adding a note to the class docstring that these inherited parameters are not used.

♻️ Suggested improvement
+_FP8_E4M3_MAX = 448.0  # Maximum representable value in FP8 E4M3 format
+
 class FP8ScaleSweepCalibrator(MseCalibrator):
     """MSE calibrator that sweeps 126 valid FP8 E4M3 candidates of ``initial_amax``.

     Candidate amax values are ``initial_amax * candidate``
+
+    Note:
+        The ``step_size``, ``start_multiplier``, and ``stop_multiplier`` parameters
+        inherited from :class:`MseCalibrator` are not used by this calibrator.
     """

     def _generate_candidates(self, device: torch.device) -> torch.Tensor:
-        """Generate 126 valid FP8 E4M3 scale candidates."""
+        """Generate 126 valid FP8 E4M3 scale candidates (all nonzero finite values / max)."""
         uint8_values = torch.arange(0, 128, dtype=torch.uint8, device=device)
         fp8_values = uint8_values.view(torch.float8_e4m3fn).float()
         valid_mask = torch.isfinite(fp8_values) & (fp8_values > 0)
-        return fp8_values[valid_mask] / 448.0
+        return fp8_values[valid_mask] / _FP8_E4M3_MAX
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/calib/mse.py` around lines 174 - 185, Replace the
magic literal 448.0 with a module-level constant named FP8_E4M3_MAX (or similar)
and use that constant in FP8ScaleSweepCalibrator._generate_candidates so the
maximum representable FP8 E4M3 value is documented and reusable (matching the
pattern in mxfp8_tensor.py); also update the FP8ScaleSweepCalibrator class
docstring to note that the inherited MseCalibrator __init__ parameters
(step_size, start_multiplier, stop_multiplier) are intentionally unused by this
subclass, or alternatively override __init__ in FP8ScaleSweepCalibrator to
explicitly accept and ignore those params so their omission is explicit.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@modelopt/torch/quantization/calib/mse.py`:
- Around line 174-185: Replace the magic literal 448.0 with a module-level
constant named FP8_E4M3_MAX (or similar) and use that constant in
FP8ScaleSweepCalibrator._generate_candidates so the maximum representable FP8
E4M3 value is documented and reusable (matching the pattern in mxfp8_tensor.py);
also update the FP8ScaleSweepCalibrator class docstring to note that the
inherited MseCalibrator __init__ parameters (step_size, start_multiplier,
stop_multiplier) are intentionally unused by this subclass, or alternatively
override __init__ in FP8ScaleSweepCalibrator to explicitly accept and ignore
those params so their omission is explicit.

In `@modelopt/torch/quantization/model_calib.py`:
- Around line 63-80: The register_fp8_sweep_calibrator function currently
overwrites entries silently and accepts a generic type; change it to mirror
register_quant_backend by (1) validating that calibrator_cls is a subclass of
FP8ScaleSweepCalibrator via issubclass(calibrator_cls, FP8ScaleSweepCalibrator)
and raising a TypeError if not, and (2) emitting a warning (e.g., using
warnings.warn) when _FP8_SWEEP_CALIBRATOR_REGISTRY already contains the backend
key before overwriting; update the function signature/type hint to accept
type[FP8ScaleSweepCalibrator] to reflect the enforced type.
- Around line 651-682: The calibrator selection/instantiation logic is
duplicated between this block and mse_calibrate(); extract a helper (e.g.,
_create_fp8_sweep_calibrator) that accepts (weight_quantizer, initial_amax,
quant_func, error_func, is_nvfp4_static) and returns the configured calibrator
instance; inside the helper, if is_nvfp4_static return NVFP4MSECalibrator
configured with amax=initial_amax, axis=weight_quantizer._calibrator._axis if
present, global_amax=weight_quantizer.global_amax and quant_func/error_func;
otherwise lookup calibrator_cls =
_FP8_SWEEP_CALIBRATOR_REGISTRY[weight_quantizer.backend] and instantiate with
amax, axis, quant_func and error_func; replace the duplicated instantiation code
in both this block and mse_calibrate() to call the new helper and assign to
weight_quantizer._calibrator.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6a31f4b3-e4ad-4edc-b8f6-48c7553b2e41

📥 Commits

Reviewing files that changed from the base of the PR and between 18ddcb7 and 7034789.

📒 Files selected for processing (2)
  • modelopt/torch/quantization/calib/mse.py
  • modelopt/torch/quantization/model_calib.py

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 2, 2026

Codecov Report

❌ Patch coverage is 33.33333% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 54.51%. Comparing base (c37c74f) to head (7034789).
⚠️ Report is 11 commits behind head on main.

Files with missing lines Patch % Lines
modelopt/torch/quantization/model_calib.py 26.66% 11 Missing ⚠️
modelopt/torch/quantization/calib/mse.py 44.44% 5 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (c37c74f) and HEAD (7034789). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (c37c74f) HEAD (7034789)
1 0
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1171       +/-   ##
===========================================
- Coverage   70.20%   54.51%   -15.69%     
===========================================
  Files         230      348      +118     
  Lines       26098    39789    +13691     
===========================================
+ Hits        18322    21692     +3370     
- Misses       7776    18097    +10321     
Flag Coverage Δ
unit 54.51% <33.33%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant