Skip to content

[1/N] Refactor llm_qat example: YAML configs + ModelOptArgParser#1172

Open
realAsma wants to merge 1 commit intomainfrom
asma/new-qat-1
Open

[1/N] Refactor llm_qat example: YAML configs + ModelOptArgParser#1172
realAsma wants to merge 1 commit intomainfrom
asma/new-qat-1

Conversation

@realAsma
Copy link
Copy Markdown
Contributor

@realAsma realAsma commented Apr 2, 2026

Summary

  • New ModelOptArgParser & ModelOptHFArguments in modelopt.torch.opt.plugins.transformers:
    an HfArgumentParser subclass that adds --config <yaml> support (YAML defaults with CLI
    overrides) and --generate_docs to auto-generate an ARGUMENTS.md reference from dataclass
    metadata.
  • Replace launch.sh with declarative YAML configs and a simplified train.py that uses
    ModelOptArgParser for argument parsing.
  • Dataset blending system: New YAML-based dataset config schema (configs/dataset/) with
    dataset_utils.py supporting multi-source blending, weighted sampling, and train/eval splits.
  • Standalone quantize.py for PTQ/export workflows without requiring full training setup.
  • Config restructure: Organized into configs/accelerate/, configs/dataset/, and
    configs/training/ subdirectories.
  • Pre-commit hook auto-regenerates ARGUMENTS.md when the parser or train.py changes.
  • README rewrite with new launch patterns, backend configuration table, and argument reference.

Future: Unified dataset config

The dataset blending config (configs/dataset/) is designed to be framework-agnostic. In the future,
this config schema can be unified across all training frameworks and promoted to
modelopt_recipes/dataset/ for shared use across examples.

Test plan

  • pytest tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py
  • pytest tests/examples/llm_qat/test_llm_qat.py
  • pre-commit run --all-files
  • Manual: accelerate launch ... train.py --config configs/training/qat_nvfp4.yaml on 2xGPU

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Recipe-driven PTQ workflow, separate quantize and train scripts, QAD support, and YAML train configs
    • Dataset blending system with YAML-driven tokenization and caching
  • Documentation

    • Auto-generated Argument Reference and updated READMEs/notebook walkthroughs
  • Tests

    • Updated integration tests to a two-step quantize→train flow and new parser unit tests
  • Chores

    • Removed legacy launcher/config and defaulted to the newer FSDP2 workflow

@realAsma realAsma requested review from a team as code owners April 2, 2026 23:32
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 2, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Splits the monolithic llm_qat launcher into modular quantize.py and train.py, adds ModelOptArgParser with --generate_docs and YAML --config support, centralizes dataclass argument definitions, moves to recipe-based quantization and adds YAML-driven dataset blending; several configs, docs, and tests updated; FSDP1 removed.

Changes

Cohort / File(s) Summary
Argument Parser & API
modelopt/torch/opt/plugins/transformers.py
Adds ModelOptHFArguments, ModelOptArgParser (YAML --config, --generate_docs, docs generator), and updates __all__.
Quantization Trainer API
modelopt/torch/quantization/plugins/transformers_trainer.py
Makes QuantizationArguments inherit ModelOptHFArguments, adds recipe field, deprecates quant_cfg with runtime warning.
Shared CLI Dataclasses
examples/llm_qat/arguments.py
New centralized dataclass modules: ModelArguments, DataArguments, TrainingArguments, QuantizeArguments built on ModelOptHFArguments.
PTQ Script
examples/llm_qat/quantize.py
New standalone PTQ script: loads recipe, builds calibration DataLoader, runs mtq.quantize/mtq.compress, saves quantized model.
Trainer Script
examples/llm_qat/train.py
Refactored to use ModelOptArgParser + shared arguments; removes inline quant args; dataset construction now YAML-driven; distill/runtime warnings adjusted.
Dataset blending & utils
examples/llm_qat/dataset_utils.py, examples/llm_qat/configs/dataset/*
New YAML-config driven dataset blending, streaming/rank-aware loading, tokenization factories, caching, and docs/examples.
Utils refactor
examples/llm_qat/utils.py
Removes legacy Daring-Anteater path; make_supervised_data_module() now accepts dataset YAML path and delegates to dataset_utils.
Launcher removal / config changes
examples/llm_qat/launch.sh, examples/llm_qat/accelerate_config/fsdp1.yaml
Deletes monolithic launcher and removes FSDP1 config; FSDP2 becomes default.
LLamaFactory & examples
examples/llm_qat/llama_factory/*, examples/llm_qat/simple_qat_train.py
Switches quant arg from quant_cfgrecipe, adds recipe-to-config resolver, updates defaults/docs and launch helper behavior.
Configs & templates
examples/llm_qat/configs/train/*.yaml
Adds training config templates for PTQ eval, QAT, QAD, and fine-tune.
Docs & READMEs
examples/llm_qat/README.md, examples/llm_qad/README.md, examples/vllm_serve/README.md, examples/llm_qat/ARGUMENTS.md
Rewrites README to QAT/QAD flow (quantize.pytrain.pyexport.py), adds argument reference (auto-generated), minor README redirects.
Notebook update
examples/llm_qat/notebooks/QAT_QAD_Walkthrough.ipynb
Adjusts QAT workflow text to reference new scripts.
Pre-commit
.pre-commit-config.yaml
Adds local hook generate-arguments-md to regenerate ARGUMENTS.md; updates license hook exclusion to train.py.
Tests
tests/examples/llm_qat/test_llm_qat.py, tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py
Refactors tests to PTQ→train split, updates accelerate config selection, removes fsdp1 parametrization; adds unit tests for ModelOptArgParser including --generate_docs and YAML --config behavior.
Large new module
examples/llm_qat/dataset_utils.py
Extensive new dataset blending/tokenization implementation (see above) with distributed cache/merge logic.
Recipe additions
modelopt_recipes/general/ptq/*
Adds new PTQ recipes (e.g., int4 blockwise weight-only) referenced by recipe fields.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant CLI as CLI
participant Parser as ModelOptArgParser
participant Config as RecipeLoader/Config
participant Tokenizer as Tokenizer
participant Data as DataLoader/BlendBuilder
participant Model as Model (HF)
participant MTQ as ModelOpt MTQ
participant FS as Filesystem

CLI->>Parser: invoke (args / --config / --generate_docs)
alt --generate_docs
    Parser->>FS: write ARGUMENTS.md
    Parser-->>CLI: exit(0)
else quantize flow
    Parser->>Config: load recipe
    Parser->>Model: load pretrained (+ tokenizer)
    Parser->>Data: build calibration dataset
    Data->>Tokenizer: tokenize batches
    Parser->>Model: move to device
    Model->>MTQ: forward_loop over calibration batches
    MTQ->>Model: apply quantization
    MTQ->>FS: print summary
    alt compress
        MTQ->>Model: compress
    end
    Model->>FS: save quantized model + tokenizer
end

mermaid
sequenceDiagram
participant CLI as CLI
participant Parser as ModelOptArgParser
participant Data as BlendBuilder
participant Trainer as HFTrainer (QAT/QAD)
participant Teacher as Teacher Model (optional)
participant FS as Filesystem

CLI->>Parser: parse training args & dataset_config
Parser->>Data: build/train & eval datasets (distributed/cache)
Parser->>Trainer: construct trainer (distill config if requested)
alt distillation
    Trainer->>Teacher: load teacher checkpoint
    Trainer->>Trainer: distillation training loop
else standard QAT/QAD
    Trainer->>Trainer: training loop
end
Trainer->>FS: save checkpoint(s)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.37% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title '[1/N] Refactor llm_qat example: YAML configs + ModelOptArgParser' clearly summarizes the main changes: refactoring the llm_qat example to use YAML configurations and introducing a new ModelOptArgParser, which are the primary objectives.
Security Anti-Patterns ✅ Passed The pull request adheres to all security coding practices outlined in SECURITY.md. All model/tokenizer loading uses safe defaults without hardcoding trust_remote_code=True, all YAML parsing uses yaml.safe_load(), no dangerous patterns introduced, and no new risky PIP dependencies added.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch asma/new-qat-1

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2026

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1172/

Built to branch gh-pages at 2026-04-06 19:44 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py (1)

70-79: Consider adding error-handling tests.

The empty YAML test is good. Consider adding tests for:

  • Non-existent --config file path (should raise appropriate error)
  • Malformed YAML syntax (should raise parse error, not crash)
Example test for missing config file
def test_missing_config_file(self, tmp_path):
    parser = ModelOptArgParser((_ModelArgs, _TrainArgs))
    with pytest.raises(FileNotFoundError):
        parser.parse_args_into_dataclasses(
            args=["--config", str(tmp_path / "nonexistent.yaml")]
        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py` around lines 70 -
79, Add two new unit tests to
tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py that exercise error
handling in ModelOptArgParser.parse_args_into_dataclasses: one test (e.g.,
test_missing_config_file) should call parser.parse_args_into_dataclasses with
args=["--config", str(tmp_path/"nonexistent.yaml")] and assert it raises
FileNotFoundError using pytest.raises; the other test (e.g.,
test_malformed_yaml_config) should write malformed YAML to a temp file and
assert parser.parse_args_into_dataclasses raises the YAML parsing exception
(yaml.YAMLError or the parser-specific parse error) using pytest.raises;
reference ModelOptArgParser, parse_args_into_dataclasses, and use
tmp_path/pytest.raises to locate where to add these tests.
examples/llm_qat/main.py (1)

197-198: Consider validating quant_cfg against known configurations.

Using getattr(mtq, quant_args.quant_cfg) with a user-provided string could potentially access unintended module attributes. While this is a CLI argument (not untrusted network input), adding validation against known configs would be more defensive.

🛡️ Optional: Add validation against known configs
     if quant_args.quant_cfg is not None:
+        if not hasattr(mtq, quant_args.quant_cfg) or not quant_args.quant_cfg.endswith("_CFG"):
+            raise ValueError(
+                f"Unknown quant_cfg: {quant_args.quant_cfg}. "
+                "See modelopt.torch.quantization.config for available options."
+            )
         quant_args.quant_cfg = getattr(mtq, quant_args.quant_cfg)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm_qat/main.py` around lines 197 - 198, The current line uses
getattr(mtq, quant_args.quant_cfg) directly which may resolve unintended
attributes; validate quant_args.quant_cfg against an explicit whitelist of
allowed config names before calling getattr. Update the logic around
quant_args.quant_cfg: define the set/list of permitted config names (e.g., known
keys in mtq), check that quant_args.quant_cfg is in that list, and only then
call getattr(mtq, quant_args.quant_cfg); if not allowed, raise a clear error or
exit with a helpful message. Ensure you reference the same symbol names
(quant_args.quant_cfg and mtq) so the check is applied right where the getattr
is used.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.pre-commit-config.yaml:
- Around line 130-142: The pre-commit hook runs examples/llm_qat/main.py but
heavy module-level imports (torch, transformers, modelopt modules) run before
the argument parser (parse_args_into_dataclasses()) and train(), causing
slowness and CUDA failures; fix by lazy-loading those heavy imports (move
torch/transformers/modelopt imports inside train() or into helper functions
called only when not generating docs) or add a tiny dedicated script that
imports only the argument dataclasses for the --generate_docs flow, or check
sys.argv for "--generate_docs" at module import time and skip heavy imports;
reference the examples/llm_qat/main.py entrypoint, the train() function, and
parse_args_into_dataclasses() when making the change.

In `@examples/llm_qat/ARGUMENTS.md`:
- Line 35: Update the metadata help string for
TrainingArguments.model_max_length in examples/llm_qat/main.py to use the
hyphenated form "right-padded" instead of "right padded"; locate the
TrainingArguments definition or the argument registration for model_max_length
(symbol: TrainingArguments.model_max_length or the argument name
"--model_max_length") and modify its help/metadata string accordingly so the
generated ARGUMENTS.md shows "right-padded".

In `@modelopt/torch/opt/plugins/transformers.py`:
- Around line 73-85: The "--config" handling may access args[idx + 1] when
"--config" is last and raise IndexError; update the block that detects
"--config" to check that idx + 1 < len(args) before reading args[idx + 1], and
if missing raise a clear error (e.g., raise ValueError or use the argparse error
mechanism) with a message like "Missing value for --config" so callers get a
friendly error instead of an IndexError; ensure the rest of the flow (setting
config_path, loading YAML, and calling self.set_defaults(**config)) only runs
when a valid path is present.

---

Nitpick comments:
In `@examples/llm_qat/main.py`:
- Around line 197-198: The current line uses getattr(mtq, quant_args.quant_cfg)
directly which may resolve unintended attributes; validate quant_args.quant_cfg
against an explicit whitelist of allowed config names before calling getattr.
Update the logic around quant_args.quant_cfg: define the set/list of permitted
config names (e.g., known keys in mtq), check that quant_args.quant_cfg is in
that list, and only then call getattr(mtq, quant_args.quant_cfg); if not
allowed, raise a clear error or exit with a helpful message. Ensure you
reference the same symbol names (quant_args.quant_cfg and mtq) so the check is
applied right where the getattr is used.

In `@tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py`:
- Around line 70-79: Add two new unit tests to
tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py that exercise error
handling in ModelOptArgParser.parse_args_into_dataclasses: one test (e.g.,
test_missing_config_file) should call parser.parse_args_into_dataclasses with
args=["--config", str(tmp_path/"nonexistent.yaml")] and assert it raises
FileNotFoundError using pytest.raises; the other test (e.g.,
test_malformed_yaml_config) should write malformed YAML to a temp file and
assert parser.parse_args_into_dataclasses raises the YAML parsing exception
(yaml.YAMLError or the parser-specific parse error) using pytest.raises;
reference ModelOptArgParser, parse_args_into_dataclasses, and use
tmp_path/pytest.raises to locate where to add these tests.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9730ae6e-e2ad-4302-b952-ad714e48536f

📥 Commits

Reviewing files that changed from the base of the PR and between 18ddcb7 and e764ad9.

📒 Files selected for processing (13)
  • .pre-commit-config.yaml
  • examples/llm_qat/ARGUMENTS.md
  • examples/llm_qat/README.md
  • examples/llm_qat/configs/finetune.yaml
  • examples/llm_qat/configs/ptq_eval.yaml
  • examples/llm_qat/configs/qad_nvfp4.yaml
  • examples/llm_qat/configs/qat_nvfp4.yaml
  • examples/llm_qat/launch.sh
  • examples/llm_qat/main.py
  • modelopt/torch/opt/plugins/transformers.py
  • modelopt/torch/quantization/plugins/transformers_trainer.py
  • tests/examples/llm_qat/test_llm_qat.py
  • tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py
💤 Files with no reviewable changes (1)
  • examples/llm_qat/launch.sh

| Argument | Type | Default | Description |
|----------|------|---------|-------------|
| `--cache_dir` | `str` | `None` | |
| `--model_max_length` | `int` | `4096` | Maximum sequence length. Sequences will be right padded (and possibly truncated). |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Minor grammar: "right padded" should be "right-padded".

Since this file is auto-generated, the fix should be applied to the source metadata help string in TrainingArguments.model_max_length (in examples/llm_qat/main.py).

🧰 Tools
🪛 LanguageTool

[grammar] ~35-~35: Use a hyphen to join words.
Context: ...sequence length. Sequences will be right padded (and possibly truncated). | | `--...

(QB_NEW_EN_HYPHEN)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm_qat/ARGUMENTS.md` at line 35, Update the metadata help string
for TrainingArguments.model_max_length in examples/llm_qat/main.py to use the
hyphenated form "right-padded" instead of "right padded"; locate the
TrainingArguments definition or the argument registration for model_max_length
(symbol: TrainingArguments.model_max_length or the argument name
"--model_max_length") and modify its help/metadata string accordingly so the
generated ARGUMENTS.md shows "right-padded".

Comment on lines +73 to +85
# --config <yaml_file>: load YAML as defaults, CLI args override
if "--config" in args:
idx = args.index("--config")
config_path = args[idx + 1]
args = args[:idx] + args[idx + 2 :] # strip --config <path> from argv
import yaml

with open(config_path) as f:
config = yaml.safe_load(f)
if config:
self.set_defaults(**config)

return super().parse_args_into_dataclasses(args=args, **kwargs)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing validation for --config without a value.

If --config is passed as the last argument without a path, args[idx + 1] will raise an IndexError. Consider adding bounds checking similar to the --generate_docs handling.

🛡️ Proposed fix to add validation
         # --config <yaml_file>: load YAML as defaults, CLI args override
         if "--config" in args:
             idx = args.index("--config")
+            if idx + 1 >= len(args):
+                raise ValueError("--config requires a path argument")
             config_path = args[idx + 1]
             args = args[:idx] + args[idx + 2 :]  # strip --config <path> from argv
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# --config <yaml_file>: load YAML as defaults, CLI args override
if "--config" in args:
idx = args.index("--config")
config_path = args[idx + 1]
args = args[:idx] + args[idx + 2 :] # strip --config <path> from argv
import yaml
with open(config_path) as f:
config = yaml.safe_load(f)
if config:
self.set_defaults(**config)
return super().parse_args_into_dataclasses(args=args, **kwargs)
# --config <yaml_file>: load YAML as defaults, CLI args override
if "--config" in args:
idx = args.index("--config")
if idx + 1 >= len(args):
raise ValueError("--config requires a path argument")
config_path = args[idx + 1]
args = args[:idx] + args[idx + 2 :] # strip --config <path> from argv
import yaml
with open(config_path) as f:
config = yaml.safe_load(f)
if config:
self.set_defaults(**config)
return super().parse_args_into_dataclasses(args=args, **kwargs)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/opt/plugins/transformers.py` around lines 73 - 85, The
"--config" handling may access args[idx + 1] when "--config" is last and raise
IndexError; update the block that detects "--config" to check that idx + 1 <
len(args) before reading args[idx + 1], and if missing raise a clear error
(e.g., raise ValueError or use the argparse error mechanism) with a message like
"Missing value for --config" so callers get a friendly error instead of an
IndexError; ensure the rest of the flow (setting config_path, loading YAML, and
calling self.set_defaults(**config)) only runs when a valid path is present.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 2, 2026

Codecov Report

❌ Patch coverage is 81.95489% with 24 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.41%. Comparing base (df80a0f) to head (7f82fe2).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
modelopt/torch/opt/plugins/transformers.py 89.10% 11 Missing ⚠️
...torch/quantization/plugins/transformers_trainer.py 50.00% 11 Missing ⚠️
modelopt/torch/distill/plugins/huggingface.py 80.00% 2 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (df80a0f) and HEAD (7f82fe2). Click for more details.

HEAD has 33 uploads less than BASE
Flag BASE (df80a0f) HEAD (7f82fe2)
unit 4 1
gpu 9 1
examples 27 5
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1172       +/-   ##
===========================================
- Coverage   74.77%   64.41%   -10.36%     
===========================================
  Files         351      351               
  Lines       40072    40192      +120     
===========================================
- Hits        29964    25890     -4074     
- Misses      10108    14302     +4194     
Flag Coverage Δ
examples 40.81% <33.08%> (+0.59%) ⬆️
gpu 18.77% <27.06%> (-38.33%) ⬇️
unit 54.82% <78.94%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@realAsma realAsma requested a review from a team as a code owner April 3, 2026 12:53
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
examples/llm_qat/main.py (1)

77-79: ⚠️ Potential issue | 🟡 Minor

Grammar: "right padded" should be "right-padded".

This is the source of the grammar issue in the generated ARGUMENTS.md.

📝 Proposed fix
     model_max_length: int = field(
         default=4096,
         metadata={
             "help": (
-                "Maximum sequence length. Sequences will be right padded (and possibly truncated)."
+                "Maximum sequence length. Sequences will be right-padded (and possibly truncated)."
             )
         },
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm_qat/main.py` around lines 77 - 79, Update the help string that
currently reads "Maximum sequence length. Sequences will be right padded (and
possibly truncated)." to use a hyphenated adjective: change it to "Maximum
sequence length. Sequences will be right-padded (and possibly truncated)."
Locate the string in examples/llm_qat/main.py where the argument's "help" value
for "Maximum sequence length" is defined (the dict entry with key "help" or the
parser.add_argument call providing that help text) and replace the phrase "right
padded" with "right-padded".
🧹 Nitpick comments (3)
modelopt/torch/opt/plugins/transformers.py (2)

80-83: Missing error handling for YAML file operations.

If the config file doesn't exist or contains invalid YAML, the error messages won't be user-friendly. Consider wrapping in try/except with clearer error context.

🛡️ Proposed improvement
-            with open(config_path) as f:
-                config = yaml.safe_load(f)
-            if config:
-                self.set_defaults(**config)
+            try:
+                with open(config_path) as f:
+                    config = yaml.safe_load(f)
+            except FileNotFoundError:
+                raise ValueError(f"Config file not found: {config_path}")
+            except yaml.YAMLError as e:
+                raise ValueError(f"Invalid YAML in config file {config_path}: {e}")
+            if config:
+                self.set_defaults(**config)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/opt/plugins/transformers.py` around lines 80 - 83, Wrap the
config file read and YAML parsing around a try/except to provide clear,
user-friendly errors: catch FileNotFoundError to indicate the provided
config_path is missing, catch yaml.YAMLError to indicate the file contains
invalid YAML, and catch a generic Exception for any other I/O issues; in the try
block keep the existing open(config_path) -> yaml.safe_load flow and call
self.set_defaults(**config) when config is truthy, and in each except produce a
clear message or re-raise a new exception that mentions config_path and the
original error to aid debugging.

78-83: YAML import inside function body — consider moving to module level.

The import yaml statement is inside the conditional block. While this works, it's unconventional and may cause slight latency on first use. Moving it to the top-level imports would be cleaner.

♻️ Suggested refactor
 import dataclasses
 import sys
 import types
 from contextlib import contextmanager
 from pathlib import Path
+import yaml

 import torch
 import transformers

Then remove the import yaml from line 78.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/opt/plugins/transformers.py` around lines 78 - 83, The local
"import yaml" inside the block that reads config_path should be moved to the
module top-level and the inline import removed; update the module imports to
include "import yaml" and then keep the body as: open(config_path),
yaml.safe_load, and if config: self.set_defaults(**config) so that yaml is only
imported once at module import time and the inline import is deleted.
tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py (1)

36-120: Consider adding edge case tests for error handling.

The current tests cover happy paths well. Consider adding tests for:

  • --config with non-existent file path
  • --config with invalid YAML syntax
  • --config as last argument without a value
💡 Example additional tests
def test_config_file_not_found(self, tmp_path):
    parser = ModelOptArgParser((_ModelArgs, _TrainArgs))
    with pytest.raises(FileNotFoundError):
        parser.parse_args_into_dataclasses(
            args=["--config", str(tmp_path / "nonexistent.yaml")]
        )

def test_config_missing_value(self):
    parser = ModelOptArgParser((_ModelArgs, _TrainArgs))
    with pytest.raises((IndexError, ValueError)):
        parser.parse_args_into_dataclasses(args=["--config"])
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py` around lines 36 -
120, Add edge-case unit tests to
tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py covering --config error
handling: create a test function (e.g., test_config_file_not_found) that
constructs ModelOptArgParser((_ModelArgs, _TrainArgs)) and calls
parse_args_into_dataclasses with args=["--config",
str(tmp_path/"nonexistent.yaml")] asserting FileNotFoundError; add a test (e.g.,
test_config_invalid_yaml) that writes malformed YAML to a tmp file and asserts
parse_args_into_dataclasses raises yaml.YAMLError or ValueError; add a test
(e.g., test_config_missing_value) that calls parse_args_into_dataclasses with
args=["--config"] and asserts an IndexError or ValueError is raised. Use the
existing ModelOptArgParser and parse_args_into_dataclasses symbols to locate
where to add these tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/llm_qat/ARGUMENTS.md`:
- Around line 28-37: The generated TrainingArguments docs omit parameters whose
names are filtered out even when their defaults differ from the HF base; update
the doc generation to surface overridden defaults: in the function
_generate_docs locate the logic that filters out fields by name and change it to
also compare each field's default to the corresponding attribute on
transformers.TrainingArguments (e.g., learning_rate, eval_strategy, do_eval) and
include any field whose default value differs, or alternatively add an explicit
"Overridden defaults" section to ARGUMENTS.md listing each overridden argument
and its non-HF default so users can see values like learning_rate=1e-4.
- Around line 9-11: The generated docs drop union/Optional info (e.g. `str |
None` becoming `str`) because _format_type currently simplifies annotations;
update _format_type so it preserves union/Optional annotations (use
typing.get_origin/get_args or equivalent) and format Union types by joining
inner types with " | " (so a source annotation `str | None` yields `str |
None`), then ensure _generate_docs uses this updated _format_type when rendering
argument types like quant_cfg and other optional fields.

In `@examples/llm_qat/main.py`:
- Around line 64-69: ModelArguments.model_name_or_path lacks metadata helper
text; add a metadata={"help": "..."} to that field so the generated ARGUMENTS.md
includes a description. Edit the ModelArguments class and update the
model_name_or_path declaration to include a clear help string (e.g., "The name
or path of the pretrained model to use.") in the metadata dict, mirroring the
style used for teacher_model.

In `@modelopt/torch/opt/plugins/transformers.py`:
- Around line 116-122: The docstring claims "additional/overridden arguments"
but the current filter (own_fields = [f for f in dataclasses.fields(dc) if
f.name not in hf_training_fields]) only keeps truly new fields; update the
behavior by either (A) changing the text emitted in the lines.append call to
"Only additional arguments are shown below" to remove "overridden", or (B)
change the filtering logic to detect overrides: lookup the corresponding field
in the HuggingFace TrainingArguments (hf_training_fields) and compare default
values (and types if desired) with the dataclass field from
dataclasses.fields(dc), and include the field if the default differs (treating
missing defaults as different); modify the code around is_hf_subclass /
own_fields and the lines.append text accordingly so overridden fields are
included when defaults differ.

---

Outside diff comments:
In `@examples/llm_qat/main.py`:
- Around line 77-79: Update the help string that currently reads "Maximum
sequence length. Sequences will be right padded (and possibly truncated)." to
use a hyphenated adjective: change it to "Maximum sequence length. Sequences
will be right-padded (and possibly truncated)." Locate the string in
examples/llm_qat/main.py where the argument's "help" value for "Maximum sequence
length" is defined (the dict entry with key "help" or the parser.add_argument
call providing that help text) and replace the phrase "right padded" with
"right-padded".

---

Nitpick comments:
In `@modelopt/torch/opt/plugins/transformers.py`:
- Around line 80-83: Wrap the config file read and YAML parsing around a
try/except to provide clear, user-friendly errors: catch FileNotFoundError to
indicate the provided config_path is missing, catch yaml.YAMLError to indicate
the file contains invalid YAML, and catch a generic Exception for any other I/O
issues; in the try block keep the existing open(config_path) -> yaml.safe_load
flow and call self.set_defaults(**config) when config is truthy, and in each
except produce a clear message or re-raise a new exception that mentions
config_path and the original error to aid debugging.
- Around line 78-83: The local "import yaml" inside the block that reads
config_path should be moved to the module top-level and the inline import
removed; update the module imports to include "import yaml" and then keep the
body as: open(config_path), yaml.safe_load, and if config:
self.set_defaults(**config) so that yaml is only imported once at module import
time and the inline import is deleted.

In `@tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py`:
- Around line 36-120: Add edge-case unit tests to
tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py covering --config error
handling: create a test function (e.g., test_config_file_not_found) that
constructs ModelOptArgParser((_ModelArgs, _TrainArgs)) and calls
parse_args_into_dataclasses with args=["--config",
str(tmp_path/"nonexistent.yaml")] asserting FileNotFoundError; add a test (e.g.,
test_config_invalid_yaml) that writes malformed YAML to a tmp file and asserts
parse_args_into_dataclasses raises yaml.YAMLError or ValueError; add a test
(e.g., test_config_missing_value) that calls parse_args_into_dataclasses with
args=["--config"] and asserts an IndexError or ValueError is raised. Use the
existing ModelOptArgParser and parse_args_into_dataclasses symbols to locate
where to add these tests.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: bf32b98f-0804-4249-87e1-ce58d13d15e2

📥 Commits

Reviewing files that changed from the base of the PR and between e764ad9 and 66e02a7.

📒 Files selected for processing (14)
  • .pre-commit-config.yaml
  • examples/llm_qad/README.md
  • examples/llm_qat/ARGUMENTS.md
  • examples/llm_qat/README.md
  • examples/llm_qat/configs/finetune.yaml
  • examples/llm_qat/configs/ptq_eval.yaml
  • examples/llm_qat/configs/qad_nvfp4.yaml
  • examples/llm_qat/configs/qat_nvfp4.yaml
  • examples/llm_qat/launch.sh
  • examples/llm_qat/main.py
  • modelopt/torch/opt/plugins/transformers.py
  • modelopt/torch/quantization/plugins/transformers_trainer.py
  • tests/examples/llm_qat/test_llm_qat.py
  • tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py
💤 Files with no reviewable changes (1)
  • examples/llm_qat/launch.sh
✅ Files skipped from review due to trivial changes (5)
  • examples/llm_qad/README.md
  • examples/llm_qat/configs/qat_nvfp4.yaml
  • examples/llm_qat/configs/ptq_eval.yaml
  • examples/llm_qat/configs/finetune.yaml
  • examples/llm_qat/configs/qad_nvfp4.yaml
🚧 Files skipped from review as they are similar to previous changes (2)
  • .pre-commit-config.yaml
  • tests/examples/llm_qat/test_llm_qat.py

Comment on lines +9 to +11
| `--quant_cfg` | `str` | `None` | Specify the quantization format for PTQ/QAT. if specified, PTQ/QAT will be enabled with the specified quantization format |
| `--calib_size` | `int` | `512` | Specify the calibration size for quantization. The calibration dataset is used to setup the quantization scale parameters for PTQ/QAT. |
| `--compress` | `bool` | `False` | Whether to compress the model weights after quantization for QLoRA. This is useful for reducing the model size. |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Type display shows str but should be str | None for optional fields.

The --quant_cfg argument has default None but type shows as `str`. The _format_type method in the generator should preserve the full union type. Similar issue applies to other optional fields.

This stems from how the type hint is being formatted in _generate_docs. The source class has quant_cfg: str | None, so the generated docs should reflect that.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm_qat/ARGUMENTS.md` around lines 9 - 11, The generated docs drop
union/Optional info (e.g. `str | None` becoming `str`) because _format_type
currently simplifies annotations; update _format_type so it preserves
union/Optional annotations (use typing.get_origin/get_args or equivalent) and
format Union types by joining inner types with " | " (so a source annotation
`str | None` yields `str | None`), then ensure _generate_docs uses this updated
_format_type when rendering argument types like quant_cfg and other optional
fields.

Comment on lines +28 to +37
## TrainingArguments

Extends [HuggingFace TrainingArguments](https://huggingface.co/docs/transformers/main_classes/trainer#transformers.TrainingArguments). Only additional/overridden arguments are shown below.

| Argument | Type | Default | Description |
|----------|------|---------|-------------|
| `--cache_dir` | `str` | `None` | |
| `--model_max_length` | `int` | `4096` | Maximum sequence length. Sequences will be right padded (and possibly truncated). |
| `--lora` | `bool` | `False` | Whether to add LoRA (Low-Rank Adaptation) adapter before training. When using real quantization, the LoRA adapter must be set, as quantized weights will be frozen during training. |
| `--distill` | `bool` | `False` | Select if training with distillation. |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

TrainingArguments section omits overridden defaults.

Fields like learning_rate, eval_strategy, do_eval, etc. that override base class defaults are not shown because _generate_docs filters them out by name. Users won't see that this example uses learning_rate=1e-4 instead of the HF default.

Consider documenting the overridden defaults somewhere, or adjusting the generator to include fields with changed defaults.

🧰 Tools
🪛 LanguageTool

[grammar] ~35-~35: Use a hyphen to join words.
Context: ...sequence length. Sequences will be right padded (and possibly truncated). | | `--...

(QB_NEW_EN_HYPHEN)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm_qat/ARGUMENTS.md` around lines 28 - 37, The generated
TrainingArguments docs omit parameters whose names are filtered out even when
their defaults differ from the HF base; update the doc generation to surface
overridden defaults: in the function _generate_docs locate the logic that
filters out fields by name and change it to also compare each field's default to
the corresponding attribute on transformers.TrainingArguments (e.g.,
learning_rate, eval_strategy, do_eval) and include any field whose default value
differs, or alternatively add an explicit "Overridden defaults" section to
ARGUMENTS.md listing each overridden argument and its non-HF default so users
can see values like learning_rate=1e-4.

Comment on lines 64 to 69
class ModelArguments(ModelOptHFArguments):
model_name_or_path: str = field(default="meta-llama/Llama-2-7b-hf")
teacher_model: str | None = field(
default=None,
metadata={"help": ("The name or path of the teacher model to use for distillation.")},
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

ModelArguments missing metadata for model_name_or_path.

The model_name_or_path field has no metadata={"help": ...}, resulting in an empty description in the generated ARGUMENTS.md (line 25 shows empty description).

📝 Proposed fix
 class ModelArguments(ModelOptHFArguments):
-    model_name_or_path: str = field(default="meta-llama/Llama-2-7b-hf")
+    model_name_or_path: str = field(
+        default="meta-llama/Llama-2-7b-hf",
+        metadata={"help": "Path to pretrained model or model identifier from huggingface.co/models."},
+    )
     teacher_model: str | None = field(
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
class ModelArguments(ModelOptHFArguments):
model_name_or_path: str = field(default="meta-llama/Llama-2-7b-hf")
teacher_model: str | None = field(
default=None,
metadata={"help": ("The name or path of the teacher model to use for distillation.")},
)
class ModelArguments(ModelOptHFArguments):
model_name_or_path: str = field(
default="meta-llama/Llama-2-7b-hf",
metadata={"help": "Path to pretrained model or model identifier from huggingface.co/models."},
)
teacher_model: str | None = field(
default=None,
metadata={"help": ("The name or path of the teacher model to use for distillation.")},
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm_qat/main.py` around lines 64 - 69,
ModelArguments.model_name_or_path lacks metadata helper text; add a
metadata={"help": "..."} to that field so the generated ARGUMENTS.md includes a
description. Edit the ModelArguments class and update the model_name_or_path
declaration to include a clear help string (e.g., "The name or path of the
pretrained model to use.") in the metadata dict, mirroring the style used for
teacher_model.

Comment on lines +116 to +122
if is_hf_subclass:
own_fields = [f for f in dataclasses.fields(dc) if f.name not in hf_training_fields]
lines.append(
"Extends [HuggingFace TrainingArguments]"
"(https://huggingface.co/docs/transformers/main_classes/trainer#transformers.TrainingArguments)."
" Only additional/overridden arguments are shown below."
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Documentation says "overridden arguments" but filter only shows truly new fields.

The docstring at line 121 says "Only additional/overridden arguments are shown below", but the filter at line 117 excludes any field whose name exists in the base TrainingArguments. Fields that override defaults (like learning_rate, eval_strategy in examples/llm_qat/main.py) won't appear in generated docs, which contradicts the stated behavior.

Consider either:

  1. Update the doc text to say "Only additional arguments" (removing "overridden"), or
  2. Change the logic to compare default values and include fields where the default differs from the base class.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/opt/plugins/transformers.py` around lines 116 - 122, The
docstring claims "additional/overridden arguments" but the current filter
(own_fields = [f for f in dataclasses.fields(dc) if f.name not in
hf_training_fields]) only keeps truly new fields; update the behavior by either
(A) changing the text emitted in the lines.append call to "Only additional
arguments are shown below" to remove "overridden", or (B) change the filtering
logic to detect overrides: lookup the corresponding field in the HuggingFace
TrainingArguments (hf_training_fields) and compare default values (and types if
desired) with the dataclass field from dataclasses.fields(dc), and include the
field if the default differs (treating missing defaults as different); modify
the code around is_hf_subclass / own_fields and the lines.append text
accordingly so overridden fields are included when defaults differ.

@realAsma realAsma requested a review from a team as a code owner April 5, 2026 21:29
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
examples/llm_qat/ARGUMENTS.md (1)

10-10: ⚠️ Potential issue | 🟡 Minor

Minor docs grammar: use “right-padded.”

Line 10 still says “right padded”. Since this is auto-generated, update the originating help string so regenerated docs use “right-padded”.

As per coding guidelines, **/*.{py,cpp,h,hpp,md}: All code (Python, C++, Markdown, etc.) must adhere to coding standards as checked by pre-commit hooks.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm_qat/ARGUMENTS.md` at line 10, Update the help string for the CLI
option that defines --model_max_length (the add_argument call for
'--model_max_length' / the "model_max_length" help text) to use the
grammatically correct phrase "right-padded." instead of "right padded" so
regenerated docs reflect the change; locate the add_argument or option
definition that sets the default/help for model_max_length and change the help
string to include "right-padded." and re-run doc generation/pre-commit.
🧹 Nitpick comments (4)
examples/llm_qat/train.py (2)

137-141: Same trust_remote_code concern applies to teacher model loading.

The teacher model loading should also use the configurable trust_remote_code parameter for consistency.

♻️ Suggested fix
         teacher_model = transformers.AutoModelForCausalLM.from_pretrained(
             model_args.teacher_model,
             cache_dir=training_args.cache_dir,
             torch_dtype=torch.bfloat16,
+            trust_remote_code=model_args.trust_remote_code,
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm_qat/train.py` around lines 137 - 141, The teacher model is being
loaded via transformers.AutoModelForCausalLM.from_pretrained without using the
configurable trust_remote_code flag; update the teacher_model loading call (the
call that assigns to teacher_model and uses model_args.teacher_model) to pass
the same trust_remote_code parameter used for the student/model loading (i.e.,
forward the configured trust_remote_code boolean into from_pretrained) so
teacher model remote code is handled consistently.

96-104: Consider exposing trust_remote_code as a configurable argument.

The model and tokenizer loading accepts user-provided model_name_or_path but doesn't expose trust_remote_code as a configurable parameter. While defaulting to False is correct for security, some models require trust_remote_code=True to function. Per coding guidelines, this should be caller-configurable rather than implicitly relying on the default.

♻️ Suggested approach

Add trust_remote_code: bool = False to ModelArguments in arguments.py, then pass it to the model and tokenizer loading calls:

     model = transformers.AutoModelForCausalLM.from_pretrained(
         model_args.model_name_or_path,
         cache_dir=training_args.cache_dir,
         torch_dtype=torch.bfloat16,
+        trust_remote_code=model_args.trust_remote_code,
     )
     ...
     tokenizer = transformers.AutoTokenizer.from_pretrained(
-        model_args.model_name_or_path, model_max_length=model_args.model_max_length
+        model_args.model_name_or_path,
+        model_max_length=model_args.model_max_length,
+        trust_remote_code=model_args.trust_remote_code,
     )

As per coding guidelines, "Do not hardcode trust_remote_code=True when loading Hugging Face Transformers models. Let the caller decide via a parameter; default to False."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm_qat/train.py` around lines 96 - 104, The loader calls in
train.py (transformers.AutoModelForCausalLM.from_pretrained and
transformers.AutoTokenizer.from_pretrained) currently don't expose
trust_remote_code; add a new boolean field trust_remote_code: bool = False to
the ModelArguments dataclass (arguments.py) and then pass
model_args.trust_remote_code into both from_pretrained calls (e.g.,
trust_remote_code=model_args.trust_remote_code) so callers can opt-in while
retaining a safe default.
examples/llm_qat/utils.py (1)

43-43: Consider moving import to module level for clarity.

The runtime import inside the function works but is unconventional. If there's no circular import concern, moving it to the module level would improve readability and enable better static analysis.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm_qat/utils.py` at line 43, The file currently does a runtime
import of build_blend_dataset and load_blend_config from dataset_utils inside a
function; move the import statement for build_blend_dataset and
load_blend_config to the top of examples/llm_qat/utils.py (module-level) and
delete the in-function import so callers use the module-level imports instead,
ensuring there are no circular-import issues before changing.
examples/llm_qat/dataset_utils.py (1)

721-723: Consider making trust_remote_code explicit for defense-in-depth.

While trust_remote_code defaults to False in HuggingFace Transformers, the coding guidelines recommend making this explicit or caller-configurable. For a CLI tool that accepts arbitrary tokenizer paths, consider adding an explicit trust_remote_code=False or exposing it as a CLI flag.

♻️ Suggested improvement
     tokenizer = transformers.AutoTokenizer.from_pretrained(
-        args.tokenizer, model_max_length=args.max_length
+        args.tokenizer, model_max_length=args.max_length, trust_remote_code=False
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm_qat/dataset_utils.py` around lines 721 - 723, The AutoTokenizer
instantiation uses transformers.AutoTokenizer.from_pretrained without explicitly
setting trust_remote_code, which should be made explicit for defense-in-depth;
update the call in dataset_utils.py where tokenizer is created
(transformers.AutoTokenizer.from_pretrained, referencing args.tokenizer and
args.max_length) to pass trust_remote_code=False or expose a new CLI flag (e.g.,
args.trust_remote_code) and pass that value to from_pretrained so callers can
opt in explicitly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/llm_qat/llama_factory/launch_llamafactory.sh`:
- Line 143: The --use_fsdp2 flag is parsed and shown in help but has no effect
because default config selection no longer reads USE_FSDP2; either remove the
flag from the help and argument parsing or make it effective by wiring it into
config selection and validation. Fix by updating the argument parsing block that
sets USE_FSDP2 (the case branch handling "--use_fsdp2") and the help echo line
to either delete that option, or if you keep it, validate the provided value
(true/false), set the USE_FSDP2 variable accordingly, and use USE_FSDP2 when
choosing the default config (where default config selection currently ignores
USE_FSDP2); if keeping the flag also emit a clear warning/error when an
unsupported value is provided.

In `@examples/llm_qat/quantize.py`:
- Around line 89-93: The calibration forward_loop currently runs forward passes
with gradients enabled and without evaluation mode; wrap the loop in
torch.inference_mode() and call model.eval() before iterating to disable
autograd and dropout/batchnorm updates (e.g., in forward_loop use model.eval()
and with torch.inference_mode(): for batch in calib_dataloader ...); also
consider restoring the original training state after calibration if the caller
expects the model's training flag unchanged.

In `@examples/llm_qat/README.md`:
- Around line 58-67: Update the README examples to use flag-style boolean
arguments to match ARGUMENTS.md: remove explicit True/False values and use the
flags `--distill` and `--lora` where currently written as `--distill True` or
`--lora True` (for example update the train invocation block containing
`--distill True` and any other occurrences such as the `--lora True` example
around line 225) so the CLI examples consistently use `--distill` and `--lora`
flag syntax.

---

Duplicate comments:
In `@examples/llm_qat/ARGUMENTS.md`:
- Line 10: Update the help string for the CLI option that defines
--model_max_length (the add_argument call for '--model_max_length' / the
"model_max_length" help text) to use the grammatically correct phrase
"right-padded." instead of "right padded" so regenerated docs reflect the
change; locate the add_argument or option definition that sets the default/help
for model_max_length and change the help string to include "right-padded." and
re-run doc generation/pre-commit.

---

Nitpick comments:
In `@examples/llm_qat/dataset_utils.py`:
- Around line 721-723: The AutoTokenizer instantiation uses
transformers.AutoTokenizer.from_pretrained without explicitly setting
trust_remote_code, which should be made explicit for defense-in-depth; update
the call in dataset_utils.py where tokenizer is created
(transformers.AutoTokenizer.from_pretrained, referencing args.tokenizer and
args.max_length) to pass trust_remote_code=False or expose a new CLI flag (e.g.,
args.trust_remote_code) and pass that value to from_pretrained so callers can
opt in explicitly.

In `@examples/llm_qat/train.py`:
- Around line 137-141: The teacher model is being loaded via
transformers.AutoModelForCausalLM.from_pretrained without using the configurable
trust_remote_code flag; update the teacher_model loading call (the call that
assigns to teacher_model and uses model_args.teacher_model) to pass the same
trust_remote_code parameter used for the student/model loading (i.e., forward
the configured trust_remote_code boolean into from_pretrained) so teacher model
remote code is handled consistently.
- Around line 96-104: The loader calls in train.py
(transformers.AutoModelForCausalLM.from_pretrained and
transformers.AutoTokenizer.from_pretrained) currently don't expose
trust_remote_code; add a new boolean field trust_remote_code: bool = False to
the ModelArguments dataclass (arguments.py) and then pass
model_args.trust_remote_code into both from_pretrained calls (e.g.,
trust_remote_code=model_args.trust_remote_code) so callers can opt-in while
retaining a safe default.

In `@examples/llm_qat/utils.py`:
- Line 43: The file currently does a runtime import of build_blend_dataset and
load_blend_config from dataset_utils inside a function; move the import
statement for build_blend_dataset and load_blend_config to the top of
examples/llm_qat/utils.py (module-level) and delete the in-function import so
callers use the module-level imports instead, ensuring there are no
circular-import issues before changing.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4e285767-bcf3-4de5-8296-f9658a97092c

📥 Commits

Reviewing files that changed from the base of the PR and between 66e02a7 and be7ba74.

📒 Files selected for processing (33)
  • .pre-commit-config.yaml
  • examples/llm_qad/README.md
  • examples/llm_qat/ARGUMENTS.md
  • examples/llm_qat/README.md
  • examples/llm_qat/accelerate_config/fsdp1.yaml
  • examples/llm_qat/arguments.py
  • examples/llm_qat/configs/accelerate/ddp.yaml
  • examples/llm_qat/configs/accelerate/deepspeed.yaml
  • examples/llm_qat/configs/accelerate/fsdp2.yaml
  • examples/llm_qat/configs/dataset/README.md
  • examples/llm_qat/configs/dataset/blend.yaml
  • examples/llm_qat/configs/dataset/blend_example.yaml
  • examples/llm_qat/configs/dataset/blend_test.yaml
  • examples/llm_qat/configs/training/finetune.yaml
  • examples/llm_qat/configs/training/ptq_eval.yaml
  • examples/llm_qat/configs/training/qad_nvfp4.yaml
  • examples/llm_qat/configs/training/qat_nvfp4.yaml
  • examples/llm_qat/dataset_utils.py
  • examples/llm_qat/launch.sh
  • examples/llm_qat/llama_factory/README.md
  • examples/llm_qat/llama_factory/launch_llamafactory.sh
  • examples/llm_qat/llama_factory/llama_config.yaml
  • examples/llm_qat/llama_factory/llama_factory.py
  • examples/llm_qat/notebooks/QAT_QAD_Walkthrough.ipynb
  • examples/llm_qat/quantize.py
  • examples/llm_qat/simple_qat_train.py
  • examples/llm_qat/train.py
  • examples/llm_qat/utils.py
  • examples/vllm_serve/README.md
  • modelopt/torch/opt/plugins/transformers.py
  • modelopt/torch/quantization/plugins/transformers_trainer.py
  • tests/examples/llm_qat/test_llm_qat.py
  • tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py
💤 Files with no reviewable changes (2)
  • examples/llm_qat/launch.sh
  • examples/llm_qat/accelerate_config/fsdp1.yaml
✅ Files skipped from review due to trivial changes (14)
  • examples/llm_qad/README.md
  • examples/vllm_serve/README.md
  • examples/llm_qat/llama_factory/llama_config.yaml
  • examples/llm_qat/configs/training/ptq_eval.yaml
  • examples/llm_qat/configs/training/finetune.yaml
  • examples/llm_qat/configs/training/qat_nvfp4.yaml
  • examples/llm_qat/configs/training/qad_nvfp4.yaml
  • examples/llm_qat/configs/dataset/blend_test.yaml
  • examples/llm_qat/configs/dataset/blend.yaml
  • tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py
  • examples/llm_qat/configs/dataset/README.md
  • examples/llm_qat/configs/dataset/blend_example.yaml
  • .pre-commit-config.yaml
  • examples/llm_qat/arguments.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • modelopt/torch/quantization/plugins/transformers_trainer.py
  • modelopt/torch/opt/plugins/transformers.py

echo " <config_file.yaml> YAML config file for llama_factory"
echo " --accelerate_config Accelerate config file (optional)"
echo " --use_fsdp2 Use FSDP2 instead of FSDP1 (default: false)"
echo " --use_fsdp2 Use FSDP2 (default: true)"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

--use_fsdp2 is currently a no-op.

The flag is parsed and documented, but default config selection no longer uses USE_FSDP2, so --use_fsdp2=false has no effect. Please either remove this flag from the interface or explicitly error/warn when it is provided.

♻️ Suggested cleanup
-    echo "Usage: $0 <config_file.yaml> [--accelerate_config <accelerate_config.yaml>] [--use_fsdp2 true|false]"
+    echo "Usage: $0 <config_file.yaml> [--accelerate_config <accelerate_config.yaml>]"

-    echo "  --use_fsdp2           Use FSDP2 (default: true)"
@@
-    USE_FSDP2="true"
@@
-            --use_fsdp2*)
-            if [[ "$1" != *=* ]]; then shift; fi
-            USE_FSDP2="${1#*=}"
-            ;;

Also applies to: 187-198, 239-244

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm_qat/llama_factory/launch_llamafactory.sh` at line 143, The
--use_fsdp2 flag is parsed and shown in help but has no effect because default
config selection no longer reads USE_FSDP2; either remove the flag from the help
and argument parsing or make it effective by wiring it into config selection and
validation. Fix by updating the argument parsing block that sets USE_FSDP2 (the
case branch handling "--use_fsdp2") and the help echo line to either delete that
option, or if you keep it, validate the provided value (true/false), set the
USE_FSDP2 variable accordingly, and use USE_FSDP2 when choosing the default
config (where default config selection currently ignores USE_FSDP2); if keeping
the flag also emit a clear warning/error when an unsupported value is provided.

Comment on lines +89 to +93
def forward_loop(model):
for batch in calib_dataloader:
batch = {k: v.to(device) for k, v in batch.items()}
model(**batch)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify calibration forward loop currently lacks inference_mode/no_grad guards.
rg -n -C2 'def forward_loop|model\(\*\*batch\)|inference_mode|no_grad' examples/llm_qat/quantize.py

Repository: NVIDIA/Model-Optimizer

Length of output: 286


🏁 Script executed:

# Get broader context around forward_loop and its usage
rg -B10 -A10 'def forward_loop' examples/llm_qat/quantize.py

# Check if model.eval() is called elsewhere in the file
rg -n 'model\.eval\(\)' examples/llm_qat/quantize.py

# Check the full context around line 87-95
sed -n '80,100p' examples/llm_qat/quantize.py

Repository: NVIDIA/Model-Optimizer

Length of output: 1370


🏁 Script executed:

# Check torch imports in the file
head -30 examples/llm_qat/quantize.py | grep -E '^import|^from'

# Check all usages of mtq.quantize to see if others use inference_mode
rg -B5 -A5 'mtq\.quantize' examples/llm_qat/

# Verify torch.inference_mode is available (should be)
rg 'torch\.inference_mode' examples/

Repository: NVIDIA/Model-Optimizer

Length of output: 4298


🏁 Script executed:

# Get full imports section
head -50 examples/llm_qat/quantize.py

# Check the calibrate function in simple_qat_train.py
rg -B10 -A10 'def calibrate' examples/llm_qat/simple_qat_train.py

Repository: NVIDIA/Model-Optimizer

Length of output: 2695


Add inference mode and eval mode to calibration loop.

The forward_loop function (lines 89–93) executes forward passes without torch.inference_mode() or model evaluation mode, causing unnecessary autograd graph construction and memory overhead during PTQ calibration.

Suggested fix
     def forward_loop(model):
-        for batch in calib_dataloader:
-            batch = {k: v.to(device) for k, v in batch.items()}
-            model(**batch)
+        model.eval()
+        with torch.inference_mode():
+            for batch in calib_dataloader:
+                batch = {k: v.to(device) for k, v in batch.items()}
+                model(**batch)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm_qat/quantize.py` around lines 89 - 93, The calibration
forward_loop currently runs forward passes with gradients enabled and without
evaluation mode; wrap the loop in torch.inference_mode() and call model.eval()
before iterating to disable autograd and dropout/batchnorm updates (e.g., in
forward_loop use model.eval() and with torch.inference_mode(): for batch in
calib_dataloader ...); also consider restoring the original training state after
calibration if the caller expects the model's training flag unchanged.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
examples/llm_qat/dataset_utils.py (1)

669-671: Expose trust_remote_code as a CLI parameter.

While trust_remote_code defaults to False (safe), the coding guidelines recommend letting callers configure this parameter. For CLI tools, users should be able to opt-in when loading tokenizers that require custom code.

♻️ Suggested improvement
     parser.add_argument("--tokenizer", required=True, help="Tokenizer name or path")
     parser.add_argument("--max_length", type=int, default=4096, help="Max sequence length")
+    parser.add_argument(
+        "--trust_remote_code",
+        action="store_true",
+        default=False,
+        help="Allow executing custom code from the tokenizer repository (use with caution)",
+    )
     args = parser.parse_args()

     config = load_blend_config(args.config)

     tokenizer = transformers.AutoTokenizer.from_pretrained(
-        args.tokenizer, model_max_length=args.max_length
+        args.tokenizer,
+        model_max_length=args.max_length,
+        trust_remote_code=args.trust_remote_code,
     )

As per coding guidelines: "Do not hardcode trust_remote_code=True when loading Hugging Face Transformers models. Let the caller decide via a parameter; default to False."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm_qat/dataset_utils.py` around lines 669 - 671, Add a CLI flag to
expose Hugging Face's trust_remote_code instead of hardcoding it: update the
argument parser to accept something like --trust-remote-code (default False) and
pass that flag into the tokenizer loading call
(transformers.AutoTokenizer.from_pretrained) using args.trust_remote_code, so
the tokenizer creation (tokenizer =
transformers.AutoTokenizer.from_pretrained(args.tokenizer,
model_max_length=args.max_length, trust_remote_code=args.trust_remote_code))
respects the caller's choice.
tests/examples/llm_qat/test_llm_qat.py (2)

82-82: Normalize tiny_llama_path before appending it to extra_cmd_args.

_run_quantize() is typed as taking list[str], and the other filesystem arguments in these calls are already wrapped with str(...). Converting tiny_llama_path here keeps the argv contract consistent and avoids surprises if the fixture returns a Path.

🧩 Proposed fix
-            "--model_name_or_path", tiny_llama_path,
+            "--model_name_or_path", str(tiny_llama_path),

Apply the same change in all three _run_quantize() call sites.

Also applies to: 105-105, 130-130

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/examples/llm_qat/test_llm_qat.py` at line 82, The test passes
tiny_llama_path directly into the argv list which can be a Path object; update
all three _run_quantize(...) call sites to convert tiny_llama_path to a string
(e.g., str(tiny_llama_path)) when adding it to extra_cmd_args so the list[str]
contract for _run_quantize and consistency with other filesystem args is
preserved; locate the calls that pass "--model_name_or_path", tiny_llama_path
and replace tiny_llama_path with its stringified form.

35-35: Avoid relying on PATH for the Python and Accelerate entry points.

Using bare python and accelerate can pick a different environment than the one running pytest, which makes these integration tests brittle in CI and local virtualenvs. If run_example_command() does not already normalize the launcher, prefer invoking both steps from the active interpreter.

Also applies to: 52-54

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/examples/llm_qat/test_llm_qat.py` at line 35, The test currently
invokes the example scripts via the bare strings "python" and "accelerate" which
can resolve to the wrong interpreter; update the test to launch both steps with
the active Python interpreter instead: replace occurrences of "python" with the
running interpreter (use sys.executable) and replace "accelerate" with a module
invocation via the interpreter (i.e. sys.executable with "-m accelerate" or
equivalent) when calling run_example_command(); update both the quantize step
(strings "python", "quantize.py") and the train/accelerate step (strings
"accelerate", "launch" / related args for "train.py") so the commands use the
active interpreter rather than relying on PATH.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/llm_qat/dataset_utils.py`:
- Around line 224-231: The current broad except Exception in the tokenization
fallback hides unexpected errors; change it to catch only expected
tokenization-related errors (e.g., ValueError, UnicodeEncodeError, and any
specific tokenizer error class your tokenizer exposes) around the tokenizer
invocation, keep the logger.warning and the fallback pad/label construction
using pad_id, max_length, and IGNORE_TOKEN_ID, and for any other exceptions
re-raise them so they surface (i.e., replace "except Exception as e" with a
narrowed except tuple and add a bare raise in an outer except to propagate
unexpected errors).

---

Nitpick comments:
In `@examples/llm_qat/dataset_utils.py`:
- Around line 669-671: Add a CLI flag to expose Hugging Face's trust_remote_code
instead of hardcoding it: update the argument parser to accept something like
--trust-remote-code (default False) and pass that flag into the tokenizer
loading call (transformers.AutoTokenizer.from_pretrained) using
args.trust_remote_code, so the tokenizer creation (tokenizer =
transformers.AutoTokenizer.from_pretrained(args.tokenizer,
model_max_length=args.max_length, trust_remote_code=args.trust_remote_code))
respects the caller's choice.

In `@tests/examples/llm_qat/test_llm_qat.py`:
- Line 82: The test passes tiny_llama_path directly into the argv list which can
be a Path object; update all three _run_quantize(...) call sites to convert
tiny_llama_path to a string (e.g., str(tiny_llama_path)) when adding it to
extra_cmd_args so the list[str] contract for _run_quantize and consistency with
other filesystem args is preserved; locate the calls that pass
"--model_name_or_path", tiny_llama_path and replace tiny_llama_path with its
stringified form.
- Line 35: The test currently invokes the example scripts via the bare strings
"python" and "accelerate" which can resolve to the wrong interpreter; update the
test to launch both steps with the active Python interpreter instead: replace
occurrences of "python" with the running interpreter (use sys.executable) and
replace "accelerate" with a module invocation via the interpreter (i.e.
sys.executable with "-m accelerate" or equivalent) when calling
run_example_command(); update both the quantize step (strings "python",
"quantize.py") and the train/accelerate step (strings "accelerate", "launch" /
related args for "train.py") so the commands use the active interpreter rather
than relying on PATH.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7d1f6438-abe5-4f9c-94c4-93bd1e217408

📥 Commits

Reviewing files that changed from the base of the PR and between be7ba74 and 7b581c7.

📒 Files selected for processing (33)
  • .pre-commit-config.yaml
  • examples/llm_qad/README.md
  • examples/llm_qat/ARGUMENTS.md
  • examples/llm_qat/README.md
  • examples/llm_qat/accelerate_config/fsdp1.yaml
  • examples/llm_qat/arguments.py
  • examples/llm_qat/configs/accelerate/ddp.yaml
  • examples/llm_qat/configs/accelerate/deepspeed.yaml
  • examples/llm_qat/configs/accelerate/fsdp2.yaml
  • examples/llm_qat/configs/dataset/README.md
  • examples/llm_qat/configs/dataset/blend.yaml
  • examples/llm_qat/configs/dataset/blend_example.yaml
  • examples/llm_qat/configs/dataset/blend_test.yaml
  • examples/llm_qat/configs/train/finetune.yaml
  • examples/llm_qat/configs/train/ptq_eval.yaml
  • examples/llm_qat/configs/train/qad_nvfp4.yaml
  • examples/llm_qat/configs/train/qat_nvfp4.yaml
  • examples/llm_qat/dataset_utils.py
  • examples/llm_qat/launch.sh
  • examples/llm_qat/llama_factory/README.md
  • examples/llm_qat/llama_factory/launch_llamafactory.sh
  • examples/llm_qat/llama_factory/llama_config.yaml
  • examples/llm_qat/llama_factory/llama_factory.py
  • examples/llm_qat/notebooks/QAT_QAD_Walkthrough.ipynb
  • examples/llm_qat/quantize.py
  • examples/llm_qat/simple_qat_train.py
  • examples/llm_qat/train.py
  • examples/llm_qat/utils.py
  • examples/vllm_serve/README.md
  • modelopt/torch/opt/plugins/transformers.py
  • modelopt/torch/quantization/plugins/transformers_trainer.py
  • tests/examples/llm_qat/test_llm_qat.py
  • tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py
💤 Files with no reviewable changes (2)
  • examples/llm_qat/accelerate_config/fsdp1.yaml
  • examples/llm_qat/launch.sh
✅ Files skipped from review due to trivial changes (13)
  • examples/llm_qat/configs/train/ptq_eval.yaml
  • examples/vllm_serve/README.md
  • examples/llm_qat/llama_factory/llama_config.yaml
  • examples/llm_qat/configs/train/finetune.yaml
  • examples/llm_qad/README.md
  • examples/llm_qat/configs/train/qat_nvfp4.yaml
  • .pre-commit-config.yaml
  • examples/llm_qat/configs/dataset/blend.yaml
  • examples/llm_qat/configs/train/qad_nvfp4.yaml
  • examples/llm_qat/configs/dataset/blend_test.yaml
  • examples/llm_qat/configs/dataset/blend_example.yaml
  • examples/llm_qat/llama_factory/README.md
  • examples/llm_qat/configs/dataset/README.md
🚧 Files skipped from review as they are similar to previous changes (5)
  • examples/llm_qat/llama_factory/llama_factory.py
  • tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py
  • examples/llm_qat/utils.py
  • examples/llm_qat/arguments.py
  • examples/llm_qat/train.py

Comment on lines +224 to +231
except Exception as e:
logger.warning("Failed to tokenize sample: %s. Skipping.", e)
pad_id = tokenizer.pad_token_id or tokenizer.eos_token_id or 0
return {
"input_ids": [pad_id] * max_length,
"attention_mask": [0] * max_length,
"labels": [IGNORE_TOKEN_ID] * max_length,
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Broad exception handling may hide underlying issues.

Catching all Exception types could mask unexpected errors like MemoryError, KeyboardInterrupt (caught via BaseException), or tokenizer misconfiguration issues that should surface to the developer.

Consider narrowing to expected tokenization failures:

🛠️ Suggested refinement
-        except Exception as e:
-            logger.warning("Failed to tokenize sample: %s. Skipping.", e)
+        except (ValueError, TypeError, KeyError) as e:
+            logger.warning("Failed to tokenize sample: %s. Skipping.", e)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
except Exception as e:
logger.warning("Failed to tokenize sample: %s. Skipping.", e)
pad_id = tokenizer.pad_token_id or tokenizer.eos_token_id or 0
return {
"input_ids": [pad_id] * max_length,
"attention_mask": [0] * max_length,
"labels": [IGNORE_TOKEN_ID] * max_length,
}
except (ValueError, TypeError, KeyError) as e:
logger.warning("Failed to tokenize sample: %s. Skipping.", e)
pad_id = tokenizer.pad_token_id or tokenizer.eos_token_id or 0
return {
"input_ids": [pad_id] * max_length,
"attention_mask": [0] * max_length,
"labels": [IGNORE_TOKEN_ID] * max_length,
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm_qat/dataset_utils.py` around lines 224 - 231, The current broad
except Exception in the tokenization fallback hides unexpected errors; change it
to catch only expected tokenization-related errors (e.g., ValueError,
UnicodeEncodeError, and any specific tokenizer error class your tokenizer
exposes) around the tokenizer invocation, keep the logger.warning and the
fallback pad/label construction using pad_id, max_length, and IGNORE_TOKEN_ID,
and for any other exceptions re-raise them so they surface (i.e., replace
"except Exception as e" with a narrowed except tuple and add a bare raise in an
outer except to propagate unexpected errors).

@realAsma realAsma requested a review from a team as a code owner April 6, 2026 13:34
@realAsma realAsma requested a review from h-guo18 April 6, 2026 13:34
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (5)
examples/llm_qat/ARGUMENTS.md (2)

10-10: ⚠️ Potential issue | 🟡 Minor

Grammar: "right padded" should be hyphenated.

Since this file is auto-generated, fix the source help text in the model_max_length field metadata in examples/llm_qat/arguments.py to use "right-padded".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm_qat/ARGUMENTS.md` at line 10, The generated docs show "right
padded" unhyphenated; update the source help text for the model_max_length
argument in examples/llm_qat/arguments.py (the metadata for the model_max_length
argument/flag) to use "right-padded" instead of "right padded" so the
auto-generated ARGUMENTS.md will render the hyphenated form; edit the
help/message string in the model_max_length field definition where its
help/description is set.

11-11: ⚠️ Potential issue | 🟡 Minor

Type display drops Optional/Union annotation.

The --teacher_model field has default None but type shows as `str` instead of `str | None`. This stems from _format_type in the generator not preserving union types.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm_qat/ARGUMENTS.md` at line 11, The rendered arg type drops
Optional/Union (so `--teacher_model` shows `str` instead of `str | None`)
because _format_type in the generator flattens or ignores union/Optional
metadata; update _format_type to detect typing.Union/typing.Optional (use
typing.get_origin and typing.get_args) and reconstruct the type string by
joining inner argument type names with " | " (ensuring NoneType is rendered as
"None"), then return that composed union string; locate the generator function
named _format_type and modify its branch that handles non-primitive/annotated
types to handle get_origin==Union and format get_args accordingly.
modelopt/torch/opt/plugins/transformers.py (2)

73-85: ⚠️ Potential issue | 🟡 Minor

Missing bounds check for --config argument.

If --config is passed as the last argument without a path, args[idx + 1] will raise an IndexError. Add validation before accessing the next element.

🛡️ Proposed fix
         # --config <yaml_file>: load YAML as defaults, CLI args override
         if "--config" in args:
             idx = args.index("--config")
+            if idx + 1 >= len(args):
+                raise ValueError("--config requires a path argument")
             config_path = args[idx + 1]
             args = args[:idx] + args[idx + 2 :]  # strip --config <path> from argv
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/opt/plugins/transformers.py` around lines 73 - 85, The code
accesses args[idx + 1] without checking bounds when parsing the "--config" flag;
add a guard after finding idx to ensure idx + 1 < len(args) and if not raise a
clear exception (e.g., ValueError("Missing path after --config")) or call
argparse/error handling, so you don't get IndexError. Update the block that
computes config_path and strips args (references: args, idx, config_path,
yaml.safe_load, self.set_defaults, and super().parse_args_into_dataclasses) to
validate the presence of the path before opening the file and only proceed to
open/configure if the check passes.

116-122: ⚠️ Potential issue | 🟡 Minor

Documentation claims "overridden" but filter excludes overridden fields.

The docstring says "additional/overridden arguments" but the filter at line 117 excludes any field whose name exists in the base TrainingArguments. Fields that override defaults won't appear in generated docs.

Either update the text to say "Only additional arguments" or adjust the logic to include fields with changed defaults.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/opt/plugins/transformers.py` around lines 116 - 122, The doc
claims "additional/overridden arguments" but the current filter (own_fields = [f
for f in dataclasses.fields(dc) if f.name not in hf_training_fields]) drops any
field that exists on HuggingFace TrainingArguments, so overridden fields are
omitted; fix by either (A) updating the text to "Only additional arguments are
shown below" or (B) include overridden fields by changing the filter to keep
fields whose names are in hf_training_fields only when their defaults differ
from the base TrainingArguments defaults (e.g., build a map of base_fields =
{f.name: f for f in dataclasses.fields(transformers.TrainingArguments)} and
include f if f.name not in hf_training_fields OR base_fields.get(f.name) has a
different default/default_factory/value than f), and then update lines and the
comment accordingly (refer to is_hf_subclass, dataclasses.fields(dc),
hf_training_fields, and the lines list).
examples/llm_qat/dataset_utils.py (1)

224-231: ⚠️ Potential issue | 🟡 Minor

Broad exception handling may mask underlying issues.

Catching all Exception types could hide unexpected errors like MemoryError or tokenizer misconfiguration. Consider narrowing to expected tokenization failures.

♻️ Suggested refinement
-        except Exception as e:
+        except (ValueError, TypeError, KeyError) as e:
             logger.warning("Failed to tokenize sample: %s. Skipping.", e)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm_qat/dataset_utils.py` around lines 224 - 231, The current
blanket except Exception in the tokenization fallback (around
tokenizer.pad_token_id, pad_id and IGNORE_TOKEN_ID) is too broad; restrict it to
expected tokenization errors (e.g., ValueError, TypeError, UnicodeEncodeError
and any tokenizer-specific error class from your tokenizer library such as
TokenizerError) so you don't swallow system-level errors like MemoryError, and
keep the same fallback return. Also include the exception class/name in the
logger message to aid debugging (e.g., log type(e) and message) while leaving
the pad_id/labels fallback logic unchanged.
🧹 Nitpick comments (5)
examples/llm_qat/train.py (1)

134-141: Consider raising ValueError instead of bare assert for better error handling.

The assert will be stripped in optimized mode (python -O). For user-facing validation, prefer an explicit check with ValueError.

♻️ Proposed fix
     if training_args.distill:
-        assert model_args.teacher_model is not None, "Teacher model is required for distillation."
+        if model_args.teacher_model is None:
+            raise ValueError("--teacher_model is required when --distill is enabled.")
 
         teacher_model = transformers.AutoModelForCausalLM.from_pretrained(
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm_qat/train.py` around lines 134 - 141, Replace the bare assert
that checks for the teacher model when training_args.distill is true with an
explicit runtime check that raises ValueError if model_args.teacher_model is
None; update the logic around training_args.distill and model_args.teacher_model
(before calling transformers.AutoModelForCausalLM.from_pretrained) to raise
ValueError("Teacher model is required for distillation.") so the error is always
raised in user-facing validation and not stripped in optimized mode, then
proceed to load teacher_model as before.
examples/llm_qat/dataset_utils.py (1)

150-156: Specify explicit encoding when opening YAML config file.

Use encoding="utf-8" for consistent cross-platform behavior.

♻️ Proposed fix
 def load_blend_config(config_path: str) -> BlendConfig:
     """Parse a dataset blend YAML file into a :class:`BlendConfig`."""
-    with open(config_path) as f:
+    with open(config_path, encoding="utf-8") as f:
         raw = yaml.safe_load(f)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm_qat/dataset_utils.py` around lines 150 - 156, The function
load_blend_config currently opens the YAML file without an explicit encoding
which can cause cross-platform inconsistencies; update the open call in
load_blend_config to specify encoding="utf-8" when opening config_path so
yaml.safe_load reads a UTF-8 stream consistently (keep the rest of the logic
that constructs DatasetSourceConfig and returns BlendConfig unchanged).
modelopt/torch/opt/plugins/transformers.py (2)

80-81: Specify explicit encoding when opening files.

Use encoding="utf-8" for consistent cross-platform behavior when reading YAML config files.

♻️ Proposed fix
-            with open(config_path) as f:
+            with open(config_path, encoding="utf-8") as f:
                 config = yaml.safe_load(f)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/opt/plugins/transformers.py` around lines 80 - 81, The file
opens the YAML config with open(config_path) which can be platform-dependent;
update the open call that reads the config (the with open(...) as f: block where
config = yaml.safe_load(f) is executed) to explicitly specify encoding="utf-8"
so the YAML is read consistently across platforms, i.e. change the with open
usage to include encoding="utf-8".

149-149: Specify encoding when writing generated docs.

For consistency with the read path suggested above, use explicit UTF-8 encoding when writing the generated markdown file.

♻️ Proposed fix
-        Path(output_path).write_text("\n".join(lines) + "\n")
+        Path(output_path).write_text("\n".join(lines) + "\n", encoding="utf-8")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/opt/plugins/transformers.py` at line 149, The write to disk
uses Path(output_path).write_text("\n".join(lines) + "\n") without an explicit
encoding; change the call to pass encoding="utf-8" so the generated markdown is
written using UTF-8 (i.e. update the Path(...).write_text invocation that writes
the joined lines from the lines variable for output_path to include
encoding="utf-8").
tests/examples/llm_qat/test_llm_qat.py (1)

43-68: --fsdp_transformer_layer_cls_to_wrap may be ignored for non-FSDP backends.

This argument is FSDP-specific but is passed to all backends including ddp and deepspeed. While HfArgumentParser typically ignores unknown/irrelevant args, consider conditionally adding it only for FSDP backends for clarity.

♻️ Proposed fix
 def _run_train(extra_cmd_args: list[str], backend: str = "fsdp2"):
     config_file = BACKEND_CONFIGS[backend]
     gradient_args = (
         ["--gradient_checkpointing", "True"]
         if backend in GRADIENT_CHECKPOINTING_BACKENDS
         else []
     )
+    fsdp_args = (
+        ["--fsdp_transformer_layer_cls_to_wrap", "LlamaDecoderLayer"]
+        if backend.startswith("fsdp")
+        else []
+    )
     run_example_command(
         [
             "accelerate", "launch",
             "--config-file", config_file,
             "train.py",
             "--dataset_config", "configs/dataset/blend_test.yaml",
-            "--fsdp_transformer_layer_cls_to_wrap", "LlamaDecoderLayer",
             "--num_train_epochs", "0.3",
             "--learning_rate", "1e-5",
             "--save_steps", "5",
             "--eval_steps", "5",
             "--train_samples", "256",
             "--eval_samples", "64",
+            *fsdp_args,
             *gradient_args,
             *extra_cmd_args,
         ],
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/examples/llm_qat/test_llm_qat.py` around lines 43 - 68, The test helper
_run_train currently always passes the FSDP-only flag
"--fsdp_transformer_layer_cls_to_wrap" to run_example_command, which can be
confusing for non-FSDP backends; change _run_train to append that flag only when
the selected backend is an FSDP backend (e.g., check backend membership in an
FSDP_BACKENDS set or reuse GRADIENT_CHECKPOINTING_BACKENDS if it denotes FSDP
backends) so that run_example_command is invoked without FSDP-specific args for
ddp/deepspeed backends; update references in _run_train where the argument list
is built (alongside BACKEND_CONFIGS, GRADIENT_CHECKPOINTING_BACKENDS, and
run_example_command) to conditionally include the
"--fsdp_transformer_layer_cls_to_wrap", "LlamaDecoderLayer" pair.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@examples/llm_qat/ARGUMENTS.md`:
- Line 10: The generated docs show "right padded" unhyphenated; update the
source help text for the model_max_length argument in
examples/llm_qat/arguments.py (the metadata for the model_max_length
argument/flag) to use "right-padded" instead of "right padded" so the
auto-generated ARGUMENTS.md will render the hyphenated form; edit the
help/message string in the model_max_length field definition where its
help/description is set.
- Line 11: The rendered arg type drops Optional/Union (so `--teacher_model`
shows `str` instead of `str | None`) because _format_type in the generator
flattens or ignores union/Optional metadata; update _format_type to detect
typing.Union/typing.Optional (use typing.get_origin and typing.get_args) and
reconstruct the type string by joining inner argument type names with " | "
(ensuring NoneType is rendered as "None"), then return that composed union
string; locate the generator function named _format_type and modify its branch
that handles non-primitive/annotated types to handle get_origin==Union and
format get_args accordingly.

In `@examples/llm_qat/dataset_utils.py`:
- Around line 224-231: The current blanket except Exception in the tokenization
fallback (around tokenizer.pad_token_id, pad_id and IGNORE_TOKEN_ID) is too
broad; restrict it to expected tokenization errors (e.g., ValueError, TypeError,
UnicodeEncodeError and any tokenizer-specific error class from your tokenizer
library such as TokenizerError) so you don't swallow system-level errors like
MemoryError, and keep the same fallback return. Also include the exception
class/name in the logger message to aid debugging (e.g., log type(e) and
message) while leaving the pad_id/labels fallback logic unchanged.

In `@modelopt/torch/opt/plugins/transformers.py`:
- Around line 73-85: The code accesses args[idx + 1] without checking bounds
when parsing the "--config" flag; add a guard after finding idx to ensure idx +
1 < len(args) and if not raise a clear exception (e.g., ValueError("Missing path
after --config")) or call argparse/error handling, so you don't get IndexError.
Update the block that computes config_path and strips args (references: args,
idx, config_path, yaml.safe_load, self.set_defaults, and
super().parse_args_into_dataclasses) to validate the presence of the path before
opening the file and only proceed to open/configure if the check passes.
- Around line 116-122: The doc claims "additional/overridden arguments" but the
current filter (own_fields = [f for f in dataclasses.fields(dc) if f.name not in
hf_training_fields]) drops any field that exists on HuggingFace
TrainingArguments, so overridden fields are omitted; fix by either (A) updating
the text to "Only additional arguments are shown below" or (B) include
overridden fields by changing the filter to keep fields whose names are in
hf_training_fields only when their defaults differ from the base
TrainingArguments defaults (e.g., build a map of base_fields = {f.name: f for f
in dataclasses.fields(transformers.TrainingArguments)} and include f if f.name
not in hf_training_fields OR base_fields.get(f.name) has a different
default/default_factory/value than f), and then update lines and the comment
accordingly (refer to is_hf_subclass, dataclasses.fields(dc),
hf_training_fields, and the lines list).

---

Nitpick comments:
In `@examples/llm_qat/dataset_utils.py`:
- Around line 150-156: The function load_blend_config currently opens the YAML
file without an explicit encoding which can cause cross-platform
inconsistencies; update the open call in load_blend_config to specify
encoding="utf-8" when opening config_path so yaml.safe_load reads a UTF-8 stream
consistently (keep the rest of the logic that constructs DatasetSourceConfig and
returns BlendConfig unchanged).

In `@examples/llm_qat/train.py`:
- Around line 134-141: Replace the bare assert that checks for the teacher model
when training_args.distill is true with an explicit runtime check that raises
ValueError if model_args.teacher_model is None; update the logic around
training_args.distill and model_args.teacher_model (before calling
transformers.AutoModelForCausalLM.from_pretrained) to raise ValueError("Teacher
model is required for distillation.") so the error is always raised in
user-facing validation and not stripped in optimized mode, then proceed to load
teacher_model as before.

In `@modelopt/torch/opt/plugins/transformers.py`:
- Around line 80-81: The file opens the YAML config with open(config_path) which
can be platform-dependent; update the open call that reads the config (the with
open(...) as f: block where config = yaml.safe_load(f) is executed) to
explicitly specify encoding="utf-8" so the YAML is read consistently across
platforms, i.e. change the with open usage to include encoding="utf-8".
- Line 149: The write to disk uses Path(output_path).write_text("\n".join(lines)
+ "\n") without an explicit encoding; change the call to pass encoding="utf-8"
so the generated markdown is written using UTF-8 (i.e. update the
Path(...).write_text invocation that writes the joined lines from the lines
variable for output_path to include encoding="utf-8").

In `@tests/examples/llm_qat/test_llm_qat.py`:
- Around line 43-68: The test helper _run_train currently always passes the
FSDP-only flag "--fsdp_transformer_layer_cls_to_wrap" to run_example_command,
which can be confusing for non-FSDP backends; change _run_train to append that
flag only when the selected backend is an FSDP backend (e.g., check backend
membership in an FSDP_BACKENDS set or reuse GRADIENT_CHECKPOINTING_BACKENDS if
it denotes FSDP backends) so that run_example_command is invoked without
FSDP-specific args for ddp/deepspeed backends; update references in _run_train
where the argument list is built (alongside BACKEND_CONFIGS,
GRADIENT_CHECKPOINTING_BACKENDS, and run_example_command) to conditionally
include the "--fsdp_transformer_layer_cls_to_wrap", "LlamaDecoderLayer" pair.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3c9bac8a-7059-4aa8-aa2f-adab7aefd0d7

📥 Commits

Reviewing files that changed from the base of the PR and between 7b581c7 and 3b4863c.

📒 Files selected for processing (34)
  • .pre-commit-config.yaml
  • examples/llm_qad/README.md
  • examples/llm_qat/ARGUMENTS.md
  • examples/llm_qat/README.md
  • examples/llm_qat/accelerate_config/fsdp1.yaml
  • examples/llm_qat/arguments.py
  • examples/llm_qat/configs/accelerate/ddp.yaml
  • examples/llm_qat/configs/accelerate/deepspeed.yaml
  • examples/llm_qat/configs/accelerate/fsdp2.yaml
  • examples/llm_qat/configs/dataset/README.md
  • examples/llm_qat/configs/dataset/blend.yaml
  • examples/llm_qat/configs/dataset/blend_example.yaml
  • examples/llm_qat/configs/dataset/blend_test.yaml
  • examples/llm_qat/configs/train/finetune.yaml
  • examples/llm_qat/configs/train/ptq_eval.yaml
  • examples/llm_qat/configs/train/qad_nvfp4.yaml
  • examples/llm_qat/configs/train/qat_nvfp4.yaml
  • examples/llm_qat/dataset_utils.py
  • examples/llm_qat/launch.sh
  • examples/llm_qat/llama_factory/README.md
  • examples/llm_qat/llama_factory/launch_llamafactory.sh
  • examples/llm_qat/llama_factory/llama_config.yaml
  • examples/llm_qat/llama_factory/llama_factory.py
  • examples/llm_qat/notebooks/QAT_QAD_Walkthrough.ipynb
  • examples/llm_qat/quantize.py
  • examples/llm_qat/simple_qat_train.py
  • examples/llm_qat/train.py
  • examples/llm_qat/utils.py
  • examples/vllm_serve/README.md
  • modelopt/torch/opt/plugins/transformers.py
  • modelopt/torch/quantization/plugins/transformers_trainer.py
  • modelopt_recipes/general/ptq/int4_blockwise_weight_only.yml
  • tests/examples/llm_qat/test_llm_qat.py
  • tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py
💤 Files with no reviewable changes (2)
  • examples/llm_qat/accelerate_config/fsdp1.yaml
  • examples/llm_qat/launch.sh
✅ Files skipped from review due to trivial changes (15)
  • examples/llm_qat/configs/train/ptq_eval.yaml
  • examples/vllm_serve/README.md
  • examples/llm_qad/README.md
  • examples/llm_qat/llama_factory/llama_config.yaml
  • examples/llm_qat/configs/dataset/blend_test.yaml
  • examples/llm_qat/configs/train/finetune.yaml
  • modelopt_recipes/general/ptq/int4_blockwise_weight_only.yml
  • examples/llm_qat/notebooks/QAT_QAD_Walkthrough.ipynb
  • examples/llm_qat/configs/train/qad_nvfp4.yaml
  • examples/llm_qat/llama_factory/launch_llamafactory.sh
  • examples/llm_qat/configs/dataset/blend.yaml
  • tests/unit/torch/opt/plugins/test_modelopt_arg_parser.py
  • examples/llm_qat/configs/dataset/blend_example.yaml
  • examples/llm_qat/configs/dataset/README.md
  • examples/llm_qat/arguments.py
🚧 Files skipped from review as they are similar to previous changes (7)
  • examples/llm_qat/configs/train/qat_nvfp4.yaml
  • examples/llm_qat/llama_factory/README.md
  • examples/llm_qat/simple_qat_train.py
  • examples/llm_qat/llama_factory/llama_factory.py
  • .pre-commit-config.yaml
  • modelopt/torch/quantization/plugins/transformers_trainer.py
  • examples/llm_qat/quantize.py

…elOpt argument parser

- Add qlora_nvfp4.yaml config and expand qat/qad/finetune YAML configs with full parameter sets
- Remove ptq_eval.yaml (superseded by unified quantize.py flow)
- Add DistillArguments class in modelopt/torch/distill with distill,
  teacher_model, criterion fields and to_distill_kwargs() helper
- Move --distill from TrainingArguments to DistillArguments
- Remove TrainModelArguments from train.py, use DistillArguments instead
- Reorder transformers.py: patching on top, arguments middle, training bottom
- Include QuantizeArguments in train.py parser for ARGUMENTS.md generation
- Rename output_dir -> quantize_output_dir in QuantizeArguments to avoid
  conflict with HF TrainingArguments.output_dir
- Regenerate ARGUMENTS.md with all argument classes
- Group distillation args in qad_nvfp4.yaml config
- Update README.md, tests, and quantize.py for renamed field

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Signed-off-by: realAsma <[email protected]>
@realAsma realAsma requested a review from a team as a code owner April 6, 2026 19:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant