Integrate Automated QDQ placement tool - part 4.3#843
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR adds comprehensive documentation for the Automated Q/DQ Placement Optimization workflow and exposes a public Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Comment |
76c395b to
916687c
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@docs/source/guides/9_qdq_placement.rst`:
- Around line 279-281: The conditional checking
autotuner.current_profile_pattern_schemes being None is counter-intuitive as
written; update the example around the if statement that references
autotuner.current_profile_pattern_schemes to either (a) add a one-line
clarifying comment explaining the API semantics (e.g., that the API returns None
when a region is already profiled/complete) or (b) change the condition if it
was reversed after confirming the API behavior; ensure the comment references
autotuner.current_profile_pattern_schemes and the subsequent print(" Already
profiled, skipping") / continue so readers understand why None means "already
profiled."
- Line 516: Replace the inconsistent pattern-cache filename reference
`./bert_base_run/pattern_cache.yaml` with the correct
`autotuner_state_pattern_cache.yaml` in the `--pattern-cache` example so the
flag matches the output structure; update the single occurrence of
`--pattern-cache ./bert_base_run/pattern_cache.yaml` to `--pattern-cache
./bert_base_run/autotuner_state_pattern_cache.yaml`.
🧹 Nitpick comments (1)
docs/source/guides/9_qdq_placement.rst (1)
457-457: Clarify scheme selection guidance.The guidance states "For models with many small regions, start with fewer schemes. For models with many big regions, start with more schemes." The rationale behind this recommendation isn't immediately clear and could benefit from explanation—specifically, whether "big regions" refers to region size (number of nodes) or computational complexity.
💡 Suggested clarification
-For models with many small regions, start with fewer schemes. For models with many big regions, start with more schemes. +The optimal scheme count depends on your model's structure: + +* **Many small regions** (e.g., 100+ patterns): Use fewer schemes (20-30) per region to keep total optimization time reasonable, as you'll be testing many unique patterns. +* **Few large regions** (e.g., <20 patterns): Use more schemes (50-100) per region to thoroughly explore each pattern's optimization space.
|
Suggestion to rename this as |
There was a problem hiding this comment.
Pull request overview
This PR adds comprehensive user guide documentation for the Automated Q/DQ Placement Optimization tool, which automatically optimizes Quantize/Dequantize node placement in ONNX models for TensorRT deployment. This is part 4.3 of a larger integration effort.
Changes:
- Adds a 911-line comprehensive guide covering the autotuner tool from quick start to advanced usage
- Documents both CLI and Python API usage patterns
- Includes troubleshooting, best practices, FAQs, and multiple examples
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
916687c to
1a57ad7
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/source/guides/9_qdq_placement.rst`:
- Around line 569-575: The example should guard against build_serialized_network
returning None before writing; after calling builder.create_builder_config() and
engine = builder.build_serialized_network(network, config) check if engine is
None and handle the failure (e.g., raise a RuntimeError or log an error and
exit) instead of directly calling f.write(engine); update the snippet around
build_serialized_network, the engine variable, and the file write so the code
only opens/writes "model.engine" when engine is a non-None bytes object.
- Around line 40-67: The output directory examples are inconsistent: the first
command states the default output is ./autotuner_output while the output tree
shows results/; update the docs so both examples and the output-tree use the
same directory name (either change the first command's default to ./results or
change the tree to ./autotuner_output) and, in the command examples around
python -m modelopt.onnx.quantization.autotune and the --output_dir ./results
example, add a short parenthetical note clarifying that --output_dir overrides
the default to avoid confusion.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📥 Commits
Reviewing files that changed from the base of the PR and between 916687c60d4c7d8ec941866e4849954c68834fa3 and e4890ec9acfdc45787f2809ac3e584f92e259742.
📒 Files selected for processing (1)
docs/source/guides/9_qdq_placement.rst
69477be to
a88df46
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
99f0c95 to
d4445a7
Compare
Head branch was pushed to by a user without write access
|
@gcunhase I fixed the doc error, please check. thanks! |
|
@willg-nv doc is still failing |
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Signed-off-by: Will Guo <willg@nvidia.com>
Head branch was pushed to by a user without write access
3eeceb1 to
dce6635
Compare
fixed the doc issue again. please check |
|
/ok to test dce6635 |
Signed-off-by: Gwena Cunha <4861122+gcunhase@users.noreply.github.com>
|
/ok to test 6a0da14 |
Signed-off-by: Will Guo <willg@nvidia.com>
Head branch was pushed to by a user without write access
|
@gcunhase this test issue was caused by rebase. I have fixed, pleaes re-run the test. |
|
/ok to test 7d43076 |
|
/ok to test b5eeb49 |
What does this PR do?
This PR upload user guide of Automated QDQ placement tool. This tool automatically search QDQ insertion points with better performance.
Overview: ?
Usage
# Add a code snippet demonstrating how to use thisTesting
Before your PR is "Ready for review"
Additional Information
Summary by CodeRabbit
Documentation
New Features