Skip to content

Integrate Automated QDQ placement tool - part 4.3#843

Merged
gcunhase merged 11 commits intoNVIDIA:mainfrom
willg-nv:dev-willg-integrate-auto-qdq-placement-part4.3
Mar 10, 2026
Merged

Integrate Automated QDQ placement tool - part 4.3#843
gcunhase merged 11 commits intoNVIDIA:mainfrom
willg-nv:dev-willg-integrate-auto-qdq-placement-part4.3

Conversation

@willg-nv
Copy link
Contributor

@willg-nv willg-nv commented Feb 3, 2026

What does this PR do?

This PR upload user guide of Automated QDQ placement tool. This tool automatically search QDQ insertion points with better performance.

Overview: ?

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes/No
  • Did you write any new necessary tests?: Yes/No
  • Did you add or update any necessary documentation?: Yes/No
  • Did you update Changelog?: Yes/No

Additional Information

Summary by CodeRabbit

  • Documentation

    • Added comprehensive guide for Automated Q/DQ Placement Optimization workflow, including quick start instructions, advanced usage patterns, configuration options, best practices, and troubleshooting.
  • New Features

    • Exposed public API for CLI parser programmatic access.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 3, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 3, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR adds comprehensive documentation for the Automated Q/DQ Placement Optimization workflow and exposes a public get_parser() function in the autotune CLI module to enable programmatic access to the argument parser, replacing the previous internal-only _get_autotune_parser() reference.

Changes

Cohort / File(s) Summary
Documentation
docs/source/guides/9_autoqdq.rst
New comprehensive guide covering the Automated Q/DQ Placement Optimization workflow, including Quick Start, How It Works, Advanced Usage, Configuration, Best Practices, Troubleshooting, Architecture, Limitations, FAQs, and Examples.
CLI Parser Refactoring
modelopt/onnx/quantization/autotune/__main__.py
Introduces public get_parser() wrapper function that returns the existing internal _get_parser(), enabling external programmatic access to the CLI argument parser.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested reviewers

  • gcunhase
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title refers to integrating an Automated QDQ placement tool and is directly related to the main changes: documentation file addition and CLI parser wrapper.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Security Anti-Patterns ✅ Passed No security anti-patterns detected in PR changes. Documentation added as RST file with no executable code. Python code change only adds a public wrapper function get_parser() without introducing torch.load, numpy.load, trust_remote_code, eval/exec, or # nosec issues.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Comment @coderabbitai help to get the list of available commands and usage tips.

@willg-nv willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part4.3 branch from 76c395b to 916687c Compare February 3, 2026 02:53
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@docs/source/guides/9_qdq_placement.rst`:
- Around line 279-281: The conditional checking
autotuner.current_profile_pattern_schemes being None is counter-intuitive as
written; update the example around the if statement that references
autotuner.current_profile_pattern_schemes to either (a) add a one-line
clarifying comment explaining the API semantics (e.g., that the API returns None
when a region is already profiled/complete) or (b) change the condition if it
was reversed after confirming the API behavior; ensure the comment references
autotuner.current_profile_pattern_schemes and the subsequent print("  Already
profiled, skipping") / continue so readers understand why None means "already
profiled."
- Line 516: Replace the inconsistent pattern-cache filename reference
`./bert_base_run/pattern_cache.yaml` with the correct
`autotuner_state_pattern_cache.yaml` in the `--pattern-cache` example so the
flag matches the output structure; update the single occurrence of
`--pattern-cache ./bert_base_run/pattern_cache.yaml` to `--pattern-cache
./bert_base_run/autotuner_state_pattern_cache.yaml`.
🧹 Nitpick comments (1)
docs/source/guides/9_qdq_placement.rst (1)

457-457: Clarify scheme selection guidance.

The guidance states "For models with many small regions, start with fewer schemes. For models with many big regions, start with more schemes." The rationale behind this recommendation isn't immediately clear and could benefit from explanation—specifically, whether "big regions" refers to region size (number of nodes) or computational complexity.

💡 Suggested clarification
-For models with many small regions, start with fewer schemes. For models with many big regions, start with more schemes.
+The optimal scheme count depends on your model's structure:
+
+* **Many small regions** (e.g., 100+ patterns): Use fewer schemes (20-30) per region to keep total optimization time reasonable, as you'll be testing many unique patterns.
+* **Few large regions** (e.g., <20 patterns): Use more schemes (50-100) per region to thoroughly explore each pattern's optimization space.

@gcunhase
Copy link
Contributor

Suggestion to rename this as *_onnx_autotuner or *_onnx_autoqdq as per #841 (comment).

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive user guide documentation for the Automated Q/DQ Placement Optimization tool, which automatically optimizes Quantize/Dequantize node placement in ONNX models for TensorRT deployment. This is part 4.3 of a larger integration effort.

Changes:

  • Adds a 911-line comprehensive guide covering the autotuner tool from quick start to advanced usage
  • Documents both CLI and Python API usage patterns
  • Includes troubleshooting, best practices, FAQs, and multiple examples

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@willg-nv willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part4.3 branch from 916687c to 1a57ad7 Compare March 2, 2026 10:12
@cjluo-nv cjluo-nv requested a review from kevalmorabia97 March 2, 2026 16:25
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/source/guides/9_qdq_placement.rst`:
- Around line 569-575: The example should guard against build_serialized_network
returning None before writing; after calling builder.create_builder_config() and
engine = builder.build_serialized_network(network, config) check if engine is
None and handle the failure (e.g., raise a RuntimeError or log an error and
exit) instead of directly calling f.write(engine); update the snippet around
build_serialized_network, the engine variable, and the file write so the code
only opens/writes "model.engine" when engine is a non-None bytes object.
- Around line 40-67: The output directory examples are inconsistent: the first
command states the default output is ./autotuner_output while the output tree
shows results/; update the docs so both examples and the output-tree use the
same directory name (either change the first command's default to ./results or
change the tree to ./autotuner_output) and, in the command examples around
python -m modelopt.onnx.quantization.autotune and the --output_dir ./results
example, add a short parenthetical note clarifying that --output_dir overrides
the default to avoid confusion.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 916687c60d4c7d8ec941866e4849954c68834fa3 and e4890ec9acfdc45787f2809ac3e584f92e259742.

📒 Files selected for processing (1)
  • docs/source/guides/9_qdq_placement.rst

@willg-nv willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part4.3 branch 2 times, most recently from 69477be to a88df46 Compare March 3, 2026 03:23
@kevalmorabia97 kevalmorabia97 requested a review from gcunhase March 3, 2026 06:54
@cjluo-nv cjluo-nv requested a review from Copilot March 3, 2026 19:12
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@gcunhase
Copy link
Contributor

gcunhase commented Mar 3, 2026

Suggestion to rename this as *_onnx_autotuner or *_onnx_autoqdq as per #841 (comment).

@willg-nv ^

@willg-nv willg-nv requested a review from a team as a code owner March 4, 2026 02:16
@willg-nv willg-nv requested a review from galagam March 4, 2026 02:16
@willg-nv willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part4.3 branch from 99f0c95 to d4445a7 Compare March 4, 2026 10:38
@gcunhase gcunhase enabled auto-merge (squash) March 6, 2026 14:11
auto-merge was automatically disabled March 6, 2026 14:54

Head branch was pushed to by a user without write access

@willg-nv
Copy link
Contributor Author

willg-nv commented Mar 6, 2026

@gcunhase I fixed the doc error, please check. thanks!

@gcunhase gcunhase enabled auto-merge (squash) March 6, 2026 16:30
@gcunhase
Copy link
Contributor

gcunhase commented Mar 6, 2026

@willg-nv doc is still failing

willg-nv added 8 commits March 9, 2026 05:27
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
auto-merge was automatically disabled March 9, 2026 05:56

Head branch was pushed to by a user without write access

@willg-nv willg-nv force-pushed the dev-willg-integrate-auto-qdq-placement-part4.3 branch from 3eeceb1 to dce6635 Compare March 9, 2026 05:56
@willg-nv
Copy link
Contributor Author

willg-nv commented Mar 9, 2026

@willg-nv doc is still failing

fixed the doc issue again. please check

@gcunhase
Copy link
Contributor

gcunhase commented Mar 9, 2026

/ok to test dce6635

@gcunhase gcunhase enabled auto-merge (squash) March 9, 2026 14:47
Signed-off-by: Gwena Cunha <4861122+gcunhase@users.noreply.github.com>
@gcunhase
Copy link
Contributor

gcunhase commented Mar 9, 2026

/ok to test 6a0da14

Signed-off-by: Will Guo <willg@nvidia.com>
auto-merge was automatically disabled March 10, 2026 01:48

Head branch was pushed to by a user without write access

@willg-nv
Copy link
Contributor Author

@gcunhase this test issue was caused by rebase. I have fixed, pleaes re-run the test.

@gcunhase
Copy link
Contributor

/ok to test 7d43076

@gcunhase
Copy link
Contributor

/ok to test b5eeb49

@gcunhase gcunhase enabled auto-merge (squash) March 10, 2026 17:56
@gcunhase gcunhase merged commit 695c8e8 into NVIDIA:main Mar 10, 2026
40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants