Integrate Automated QDQ placement tool - part 4.3 by willg-nv · Pull Request #843 · NVIDIA/Model-Optimizer

willg-nv · 2026-02-03T02:52:47Z

What does this PR do?

This PR upload user guide of Automated QDQ placement tool. This tool automatically search QDQ insertion points with better performance.

Overview: ?

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed.
Is this change backward compatible?: Yes/No
Did you write any new necessary tests?: Yes/No
Did you add or update any necessary documentation?: Yes/No
Did you update Changelog?: Yes/No

Additional Information

Summary by CodeRabbit

Documentation
- Added comprehensive guide for Automated Q/DQ Placement Optimization workflow, including quick start instructions, advanced usage patterns, configuration options, best practices, and troubleshooting.
New Features
- Exposed public API for CLI parser programmatic access.

copy-pr-bot · 2026-02-03T02:52:50Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-02-03T02:53:09Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR adds comprehensive documentation for the Automated Q/DQ Placement Optimization workflow and exposes a public get_parser() function in the autotune CLI module to enable programmatic access to the argument parser, replacing the previous internal-only _get_autotune_parser() reference.

Changes

Cohort / File(s)	Summary
Documentation `docs/source/guides/9_autoqdq.rst`	New comprehensive guide covering the Automated Q/DQ Placement Optimization workflow, including Quick Start, How It Works, Advanced Usage, Configuration, Best Practices, Troubleshooting, Architecture, Limitations, FAQs, and Examples.
CLI Parser Refactoring `modelopt/onnx/quantization/autotune/__main__.py`	Introduces public `get_parser()` wrapper function that returns the existing internal `_get_parser()`, enabling external programmatic access to the CLI argument parser.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Integrate Automated QDQ placement tool - part 3.3 #839: Modifies the same CLI parser module and handles parser initialization in a related manner.

Suggested reviewers

gcunhase

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title refers to integrating an Automated QDQ placement tool and is directly related to the main changes: documentation file addition and CLI parser wrapper.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Security Anti-Patterns	✅ Passed	No security anti-patterns detected in PR changes. Documentation added as RST file with no executable code. Python code change only adds a public wrapper function get_parser() without introducing torch.load, numpy.load, trust_remote_code, eval/exec, or # nosec issues.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@docs/source/guides/9_qdq_placement.rst`:
- Around line 279-281: The conditional checking
autotuner.current_profile_pattern_schemes being None is counter-intuitive as
written; update the example around the if statement that references
autotuner.current_profile_pattern_schemes to either (a) add a one-line
clarifying comment explaining the API semantics (e.g., that the API returns None
when a region is already profiled/complete) or (b) change the condition if it
was reversed after confirming the API behavior; ensure the comment references
autotuner.current_profile_pattern_schemes and the subsequent print("  Already
profiled, skipping") / continue so readers understand why None means "already
profiled."
- Line 516: Replace the inconsistent pattern-cache filename reference
`./bert_base_run/pattern_cache.yaml` with the correct
`autotuner_state_pattern_cache.yaml` in the `--pattern-cache` example so the
flag matches the output structure; update the single occurrence of
`--pattern-cache ./bert_base_run/pattern_cache.yaml` to `--pattern-cache
./bert_base_run/autotuner_state_pattern_cache.yaml`.

🧹 Nitpick comments (1)

docs/source/guides/9_qdq_placement.rst (1)
457-457: Clarify scheme selection guidance.

The guidance states "For models with many small regions, start with fewer schemes. For models with many big regions, start with more schemes." The rationale behind this recommendation isn't immediately clear and could benefit from explanation—specifically, whether "big regions" refers to region size (number of nodes) or computational complexity.
💡 Suggested clarification
-For models with many small regions, start with fewer schemes. For models with many big regions, start with more schemes.
+The optimal scheme count depends on your model's structure:
+
+* **Many small regions** (e.g., 100+ patterns): Use fewer schemes (20-30) per region to keep total optimization time reasonable, as you'll be testing many unique patterns.
+* **Few large regions** (e.g., <20 patterns): Use more schemes (50-100) per region to thoroughly explore each pattern's optimization space.

docs/source/guides/9_qdq_placement.rst

gcunhase · 2026-02-26T16:49:19Z

Suggestion to rename this as *_onnx_autotuner or *_onnx_autoqdq as per #841 (comment).

Copilot

Pull request overview

This PR adds comprehensive user guide documentation for the Automated Q/DQ Placement Optimization tool, which automatically optimizes Quantize/Dequantize node placement in ONNX models for TensorRT deployment. This is part 4.3 of a larger integration effort.

Changes:

Adds a 911-line comprehensive guide covering the autotuner tool from quick start to advanced usage
Documents both CLI and Python API usage patterns
Includes troubleshooting, best practices, FAQs, and multiple examples

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

docs/source/guides/9_qdq_placement.rst

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/source/guides/9_qdq_placement.rst`:
- Around line 569-575: The example should guard against build_serialized_network
returning None before writing; after calling builder.create_builder_config() and
engine = builder.build_serialized_network(network, config) check if engine is
None and handle the failure (e.g., raise a RuntimeError or log an error and
exit) instead of directly calling f.write(engine); update the snippet around
build_serialized_network, the engine variable, and the file write so the code
only opens/writes "model.engine" when engine is a non-None bytes object.
- Around line 40-67: The output directory examples are inconsistent: the first
command states the default output is ./autotuner_output while the output tree
shows results/; update the docs so both examples and the output-tree use the
same directory name (either change the first command's default to ./results or
change the tree to ./autotuner_output) and, in the command examples around
python -m modelopt.onnx.quantization.autotune and the --output_dir ./results
example, add a short parenthetical note clarifying that --output_dir overrides
the default to avoid confusion.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 916687c60d4c7d8ec941866e4849954c68834fa3 and e4890ec9acfdc45787f2809ac3e584f92e259742.

📒 Files selected for processing (1)

docs/source/guides/9_qdq_placement.rst

docs/source/guides/9_qdq_placement.rst

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

docs/source/guides/9_qdq_placement.rst

gcunhase · 2026-03-03T22:54:49Z

Suggestion to rename this as *_onnx_autotuner or *_onnx_autoqdq as per #841 (comment).

@willg-nv ^

docs/source/guides/9_qdq_placement.rst

willg-nv · 2026-03-06T14:55:23Z

@gcunhase I fixed the doc error, please check. thanks!

gcunhase · 2026-03-06T17:19:45Z

@willg-nv doc is still failing

Signed-off-by: Will Guo <willg@nvidia.com>

willg-nv · 2026-03-09T05:57:25Z

@willg-nv doc is still failing

fixed the doc issue again. please check

gcunhase · 2026-03-09T14:34:23Z

/ok to test dce6635

Signed-off-by: Gwena Cunha <4861122+gcunhase@users.noreply.github.com>

gcunhase · 2026-03-09T16:14:07Z

/ok to test 6a0da14

Signed-off-by: Will Guo <willg@nvidia.com>

willg-nv · 2026-03-10T01:51:41Z

@gcunhase this test issue was caused by rebase. I have fixed, pleaes re-run the test.

gcunhase · 2026-03-10T17:31:07Z

/ok to test 7d43076

gcunhase · 2026-03-10T17:56:45Z

/ok to test b5eeb49

willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part4.3 branch from 76c395b to 916687c Compare February 3, 2026 02:53

coderabbitai bot reviewed Feb 3, 2026

View reviewed changes

docs/source/guides/9_qdq_placement.rst Show resolved Hide resolved

docs/source/guides/9_qdq_placement.rst Outdated Show resolved Hide resolved

modelopt-bot mentioned this pull request Feb 14, 2026

Add Automated QDQ placement example - Part 4.1 #841

Merged

gcunhase mentioned this pull request Feb 26, 2026

Add Automated QDQ placement reference - part 4.2 #842

Closed

cjluo-nv requested a review from Copilot February 27, 2026 18:16

Copilot started reviewing on behalf of cjluo-nv February 27, 2026 18:17 View session

Copilot AI reviewed Feb 27, 2026

View reviewed changes

willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part4.3 branch from 916687c to 1a57ad7 Compare March 2, 2026 10:12

cjluo-nv requested a review from kevalmorabia97 March 2, 2026 16:25

coderabbitai bot reviewed Mar 3, 2026

View reviewed changes

docs/source/guides/9_qdq_placement.rst Show resolved Hide resolved

docs/source/guides/9_qdq_placement.rst Show resolved Hide resolved

willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part4.3 branch 2 times, most recently from 69477be to a88df46 Compare March 3, 2026 03:23

kevalmorabia97 requested a review from gcunhase March 3, 2026 06:54

gcunhase mentioned this pull request Mar 3, 2026

Integrate Automated QDQ placement tool - Part 4 #704

Closed

cjluo-nv requested a review from Copilot March 3, 2026 19:12

Copilot started reviewing on behalf of cjluo-nv March 3, 2026 19:12 View session

Copilot AI reviewed Mar 3, 2026

View reviewed changes

docs/source/guides/9_qdq_placement.rst Show resolved Hide resolved

docs/source/guides/9_qdq_placement.rst Outdated Show resolved Hide resolved