Reduce QP overheads by rg20 · Pull Request #1140 · NVIDIA/cuopt

rg20 · 2026-04-23T21:21:45Z

Description

Issue

Checklist

I am familiar with the Contributing Guidelines.
Testing
- New or existing tests cover these changes
- Added tests
- Created an issue to follow-up
- NA
Documentation
- The documentation is up to date with these changes
- Added new documentation
- NA

copy-pr-bot · 2026-04-23T21:21:49Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-04-23T21:24:08Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds native free-variable support for QP barrier solves, adjusts related barrier math and control flow, records free-variable indices in presolve metadata, conditions dense-column detection on Q presence/diagonality, times cuDSS initialization, removes one pre-barrier diagnostic, and introduces a ratio-based GMRES stop/update rule.

Changes

Cohort / File(s)	Summary
Barrier: native free-variable support `cpp/src/barrier/barrier.cu`	Adds `n_free_vars` and GPU flag vector `d_is_free_` in `iteration_data_t`, initializes from `presolve_info.free_variable_indices`. Forces z=0 for native free vars during initial point, saves/restores x around `ensure_positive`. Excludes free vars from ratio-test (`gpu_max_step_to_boundary`), modifies search-direction formation (`gpu_compute_search_direction`) to skip complementarity terms for free vars and set `dz=0`, adjusts complementarity denominators and RHS handling when updating `mu`/`compute_cc_rhs`. Adds NVTX around augmented solve and `check_for_suboptimal_solution`.
Presolve: free-variable metadata & behavior `cpp/src/dual_simplex/presolve.hpp`, `cpp/src/dual_simplex/presolve.cpp`	Adds `std::vector<i_t> free_variable_indices` to `presolve_info_t`. For problems with `Q.n > 0` (QPs), presolve no longer splits free variables into v/w; instead records their indices and leaves them unchanged. For non-QP cases, existing split behavior remains.
Barrier: dense-column detection conditional `cpp/src/barrier/barrier.cu`	Dense-column detection (`find_dense_columns`) is only run when `!has_Q
Sparse Cholesky / cuDSS timing `cpp/src/barrier/sparse_cholesky.cuh`	Adds a timestamp immediately before `cudssCreateMg` and logs elapsed initialization time after `cudssSetStream`; call sequence/params unchanged.
Removed pre-barrier diagnostic scan `cpp/src/dual_simplex/solve.cpp`	Removes conditional pre-barrier scan and logging over `presolve_info.free_variable_pairs` for QPs that computed quadratic row spans and logged when both variables lacked quadratic terms.
Iterative refinement: ratio-based stopping `cpp/src/barrier/iterative_refinement.hpp`	GMRES iterative refinement uses `improvement_ratio = best_residual / residual`; updates saved best and `best_residual` only when improvement > `stop_ratio`. If improvement ∈ (0, stop_ratio), logs (when enabled), saves current best, and breaks early; otherwise treats stagnation/increase as before and restores prior best.
Other small instrumentation/formatting `...`	Minor NVTX additions and whitespace/formatting edits accompanying the above changes.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	The description is entirely boilerplate containing only contributor guidelines checklist items without any meaningful detail about the actual changes, their rationale, or implementation.	Replace the boilerplate with a substantive description explaining what free variable handling was implemented, why it reduces QP overheads, and how the augmented system approach works.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Reduce QP overheads' clearly and concisely describes the main objective of the changeset, which focuses on optimizing quadratic program performance through native free variable handling and dense-column detection improvements.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

yuwenchen95 · 2026-04-24T07:42:53Z

  // Solve using barrier
  lp_solution_t<i_t, f_t> barrier_solution(barrier_lp.num_rows, barrier_lp.num_cols);

-  // Clear variable pairs for QP


May I ask why we remove the check here?

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

cpp/src/barrier/iterative_refinement.hpp (1)
181-182: Make the early-stop ratio configurable.

Line 181 hard-codes a solver-wide accuracy/performance tradeoff into a shared iterative-refinement path. A fixed 5.0 may be fine for the profiling case that motivated this PR, but it is likely too aggressive for some matrices and too loose for others. Please plumb this through existing solver settings or a caller-supplied parameter instead of baking it into the header.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/barrier/iterative_refinement.hpp` around lines 181 - 182, Replace the
hard-coded local constant f_t stop_ratio = 5.0 in iterative_refinement.hpp with
a configurable parameter sourced from the solver settings or a caller-provided
argument: add a new float-typed setting (or function parameter) named stop_ratio
(or iterative_refinement_stop_ratio) to the solver config or to the iterative
refinement function/method, use that variable in place of the literal when
computing bnorm and the early-stop decision, keep the default value at 5.0 if
not supplied, and update any callers or constructor that invoke the iterative
refinement path to thread the new setting through.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/src/barrier/iterative_refinement.hpp`:
- Around line 365-380: The branch treating improvement_ratio == stop_ratio as
non-improvement is wrong; update the comparison logic in the iterative
refinement block (symbols: improvement_ratio, stop_ratio, best_residual, x_sav,
x, show_info, CUOPT_LOG_INFO) so that hitting the threshold counts as
improvement (e.g., use >= instead of >) or use an epsilon-based comparison for
floating-point stability (|improvement_ratio - stop_ratio| <= eps) to decide
improvement, and ensure the residual update and raft::copy for
best_residual/x_sav->x remain inside the "improved" branch while keeping the
logging/break behavior only for true stagnation/worsening cases.

---

Nitpick comments:
In `@cpp/src/barrier/iterative_refinement.hpp`:
- Around line 181-182: Replace the hard-coded local constant f_t stop_ratio =
5.0 in iterative_refinement.hpp with a configurable parameter sourced from the
solver settings or a caller-provided argument: add a new float-typed setting (or
function parameter) named stop_ratio (or iterative_refinement_stop_ratio) to the
solver config or to the iterative refinement function/method, use that variable
in place of the literal when computing bnorm and the early-stop decision, keep
the default value at 5.0 if not supplied, and update any callers or constructor
that invoke the iterative refinement path to thread the new setting through.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a7b6ab20-ce6f-49c8-9326-46dd180e9846

📥 Commits

Reviewing files that changed from the base of the PR and between cd41420 and 020312b.

📒 Files selected for processing (1)

cpp/src/barrier/iterative_refinement.hpp

coderabbitai · 2026-04-24T17:51:29Z

+    f_t improvement_ratio = best_residual / residual;
    // Track best solution
-    if (residual < best_residual) {
+    if (improvement_ratio > stop_ratio) {
      best_residual = residual;
      raft::copy(x_sav.data(), x.data(), x.size(), x.stream());
+    } else if (improvement_ratio < stop_ratio && improvement_ratio > 1.0) {
+      best_residual = residual;
+      raft::copy(x_sav.data(), x.data(), x.size(), x.stream());
+      // Residual decreased, but not enough, continue
+      if (show_info) {
+        CUOPT_LOG_INFO("GMRES IR: improvement ratio %e is less than %e, breaking early",
+                       improvement_ratio,
+                       stop_ratio);
+      }
+      break;
    } else {


⚠️ Potential issue | 🟡 Minor

Treat improvement_ratio == stop_ratio as improvement, not stagnation.

With the current > / < split, an exact threshold hit falls into the else branch, restores x_sav, and logs it as "residual increased or stagnated" even though the residual improved. That can discard a better iterate at the boundary. Use >= here at minimum, or an epsilon-based comparison if you want this threshold to be numerically stable.

🔧 Minimal fix

- if (improvement_ratio > stop_ratio) { + if (improvement_ratio >= stop_ratio) { best_residual = residual; raft::copy(x_sav.data(), x.data(), x.size(), x.stream()); - } else if (improvement_ratio < stop_ratio && improvement_ratio > 1.0) { + } else if (improvement_ratio > 1.0) { best_residual = residual; raft::copy(x_sav.data(), x.data(), x.size(), x.stream());

As per coding guidelines, check numerical stability: prevent overflow/underflow, precision loss, division by zero/near-zero, and use epsilon comparisons for floating-point equality checks.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

f_t improvement_ratio = best_residual / residual;

// Track best solution

if (residual < best_residual) {

if (improvement_ratio > stop_ratio) {

best_residual = residual;

raft::copy(x_sav.data(), x.data(), x.size(), x.stream());

} else if (improvement_ratio < stop_ratio && improvement_ratio > 1.0) {

best_residual = residual;

raft::copy(x_sav.data(), x.data(), x.size(), x.stream());

// Residual decreased, but not enough, continue

if (show_info) {

CUOPT_LOG_INFO("GMRES IR: improvement ratio %e is less than %e, breaking early",

improvement_ratio,

stop_ratio);

}

break;

} else {

f_t improvement_ratio = best_residual / residual;

// Track best solution

if (improvement_ratio >= stop_ratio) {

best_residual = residual;

raft::copy(x_sav.data(), x.data(), x.size(), x.stream());

} else if (improvement_ratio > 1.0) {

best_residual = residual;

raft::copy(x_sav.data(), x.data(), x.size(), x.stream());

// Residual decreased, but not enough, continue

if (show_info) {

CUOPT_LOG_INFO("GMRES IR: improvement ratio %e is less than %e, breaking early",

improvement_ratio,

stop_ratio);

}

break;

} else {

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cpp/src/barrier/iterative_refinement.hpp` around lines 365 - 380, The branch treating improvement_ratio == stop_ratio as non-improvement is wrong; update the comparison logic in the iterative refinement block (symbols: improvement_ratio, stop_ratio, best_residual, x_sav, x, show_info, CUOPT_LOG_INFO) so that hitting the threshold counts as improvement (e.g., use >= instead of >) or use an epsilon-based comparison for floating-point stability (|improvement_ratio - stop_ratio| <= eps) to decide improvement, and ensure the residual update and raft::copy for best_residual/x_sav->x remain inside the "improved" branch while keeping the logging/break behavior only for true stagnation/worsening cases.

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/src/barrier/barrier.cu`:
- Around line 2310-2340: The free-variable branch currently sets diag to 0
whenever q_jj > 0 which allows tiny positive Q(j,j) to produce near-zero diag;
change the lambda used in the cub::DeviceTransform::Transform (the one capturing
free_var_reg in the data.Q.n > 0 && data.Q_diagonal case) to floor the returned
value at free_var_reg (e.g., return max(q_jj, free_var_reg) or return (q_jj >
free_var_reg) ? q_jj : free_var_reg) so diag for free variables is never below
free_var_reg; also ensure the other free-variable-only lambda (the data.Q.n == 0
path) similarly returns at least free_var_reg for is_free entries.
- Around line 3596-3606: The initial mu computation (the one that divides by n +
num_upper_bounds before the iteration loop) doesn't account for native free
variables set via presolve_info.free_variable_indices and data.n_free_vars;
update that initial mu calculation to use the same adjusted divisor as
compute_mu()/compute_target_mu() by subtracting data.n_free_vars (i.e., use n +
num_upper_bounds - data.n_free_vars) when presolve_info.free_variable_indices is
non-empty, and ensure data.n_free_vars and d_is_free_ are initialized before
this initial mu is computed so the first iteration uses the same barrier measure
as later compute_mu()/compute_target_mu() calls.
- Around line 3081-3083: The calculation of mu_aff and subsequent mu division
must guard against mu_denom == 0 (mu_denom = x.size() + n_upper_bounds -
n_free_vars) to avoid division-by-zero when all variables are free; add an
explicit fast-path: compute mu_denom as now, then if mu_denom <= eps (use a
small f_t epsilon like 0 or machine epsilon), set mu_aff = static_cast<f_t>(0)
and mu = static_cast<f_t>(0), set any mu ratio (e.g., mu_aff_over_mu) to a safe
default (1 or 0 depending on downstream expectations) and skip the
mu-aff/mu-based updates, otherwise perform the existing divisions. Apply the
identical guard and handling for the other occurrence around the mu computation
block (the second spot noted in the comment).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 94fd7d1f-a999-4dab-a9dc-d5d097ed0af6

📥 Commits

Reviewing files that changed from the base of the PR and between 104b82c and 0002029.

📒 Files selected for processing (3)

cpp/src/barrier/barrier.cu
cpp/src/dual_simplex/presolve.cpp
cpp/src/dual_simplex/presolve.hpp

coderabbitai · 2026-04-25T13:48:47Z

+  f_t mu_denom            = static_cast<f_t>(data.x.size()) + static_cast<f_t>(data.n_upper_bounds);
+  mu_denom -= static_cast<f_t>(data.n_free_vars);
+  mu_aff = complementarity_aff_sum / mu_denom;


⚠️ Potential issue | 🔴 Critical

Handle the zero-complementarity case before computing mu.

With native free-variable QPs, x.size() + n_upper_bounds - n_free_vars can be zero when every variable is free and there are no finite upper bounds. In that case both mu_aff and mu divide by zero, and the subsequent mu_aff / mu ratio is undefined as well. This needs a dedicated fast path rather than just subtracting n_free_vars.

💡 Suggested guard

- f_t mu_denom = static_cast<f_t>(data.x.size()) + static_cast<f_t>(data.n_upper_bounds); - mu_denom -= static_cast<f_t>(data.n_free_vars); - mu_aff = complementarity_aff_sum / mu_denom; + f_t mu_denom = static_cast<f_t>(data.x.size()) + static_cast<f_t>(data.n_upper_bounds) - + static_cast<f_t>(data.n_free_vars); + if (mu_denom == f_t(0)) { + mu_aff = f_t(0); + sigma = f_t(0); + new_mu = f_t(0); + return; + } + mu_aff = complementarity_aff_sum / mu_denom;

- f_t mu_denom = static_cast<f_t>(data.x.size()) + static_cast<f_t>(data.n_upper_bounds); - mu_denom -= static_cast<f_t>(data.n_free_vars); // free vars don't contribute to mu + f_t mu_denom = static_cast<f_t>(data.x.size()) + static_cast<f_t>(data.n_upper_bounds) - + static_cast<f_t>(data.n_free_vars); + if (mu_denom == f_t(0)) { + mu = f_t(0); + return; + }

As per coding guidelines, "Check numerical stability: prevent overflow/underflow, precision loss, division by zero/near-zero, and use epsilon comparisons for floating-point equality checks."

Also applies to: 3307-3316

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cpp/src/barrier/barrier.cu` around lines 3081 - 3083, The calculation of mu_aff and subsequent mu division must guard against mu_denom == 0 (mu_denom = x.size() + n_upper_bounds - n_free_vars) to avoid division-by-zero when all variables are free; add an explicit fast-path: compute mu_denom as now, then if mu_denom <= eps (use a small f_t epsilon like 0 or machine epsilon), set mu_aff = static_cast<f_t>(0) and mu = static_cast<f_t>(0), set any mu ratio (e.g., mu_aff_over_mu) to a safe default (1 or 0 depending on downstream expectations) and skip the mu-aff/mu-based updates, otherwise perform the existing divisions. Apply the identical guard and handling for the other occurrence around the mu computation block (the second spot noted in the comment).

rg20 requested a review from a team as a code owner April 23, 2026 21:21

rg20 requested review from kaatish and nguidotti April 23, 2026 21:21

rg20 requested review from chris-maes and removed request for kaatish and nguidotti April 23, 2026 21:21

rg20 added non-breaking Introduces a non-breaking change improvement Improves an existing functionality labels Apr 23, 2026

rg20 added this to the 26.06 milestone Apr 23, 2026

rg20 added 2 commits April 23, 2026 14:23

Disable looking for dense columns with q is not diagonal

588d5c5

Add timer around cudss create context

cd41420

rg20 force-pushed the qp_profiling branch from 939b08f to cd41420 Compare April 23, 2026 21:23

yuwenchen95 reviewed Apr 24, 2026

View reviewed changes

Early termination for when the residual is not decreasing fast enough

020312b

coderabbitai Bot reviewed Apr 24, 2026

View reviewed changes

rg20 and others added 2 commits April 24, 2026 11:51

Add missing nvtx ranges

104b82c

Add support for handling free variables in QPs with the augmented system

0002029

coderabbitai Bot reviewed Apr 25, 2026

View reviewed changes

rg20 added 10 commits April 27, 2026 06:34

Disable bounding free variables for QP

f6be237

More timers

7b8c2ec

Fix compilation error

0c59dd3

Don't use iterative refinement for initial point calculation

d4bfc9f

set step_size to 0.99

8ea77bf

Do not use threading layer for cudss reordering

f340ca2

Increase the resolution of timer

a80f902

directly go from optimization_problem_t to simplex_problem_t

add1a01

revert unnecessary change

47ddf84

don't compute ADAT related info when using augmented

2f5db64

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce QP overheads#1140

Reduce QP overheads#1140
rg20 wants to merge 15 commits intoNVIDIA:mainfrom
rg20:qp_profiling

rg20 commented Apr 23, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Apr 23, 2026

Uh oh!

coderabbitai Bot commented Apr 23, 2026 •

edited

Loading

Reviews paused

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

yuwenchen95 Apr 24, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

rg20 commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Issue

Checklist

Uh oh!

copy-pr-bot Bot commented Apr 23, 2026

Uh oh!

coderabbitai Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

yuwenchen95 Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rg20 commented Apr 23, 2026 •

edited

Loading

coderabbitai Bot commented Apr 23, 2026 •

edited

Loading