Skip to content

Reduce QP overheads#1140

Open
rg20 wants to merge 15 commits intoNVIDIA:mainfrom
rg20:qp_profiling
Open

Reduce QP overheads#1140
rg20 wants to merge 15 commits intoNVIDIA:mainfrom
rg20:qp_profiling

Conversation

@rg20
Copy link
Copy Markdown
Contributor

@rg20 rg20 commented Apr 23, 2026

Description

Issue

Checklist

  • I am familiar with the Contributing Guidelines.
  • Testing
    • New or existing tests cover these changes
    • Added tests
    • Created an issue to follow-up
    • NA
  • Documentation
    • The documentation is up to date with these changes
    • Added new documentation
    • NA

@rg20 rg20 requested a review from a team as a code owner April 23, 2026 21:21
@rg20 rg20 requested review from kaatish and nguidotti April 23, 2026 21:21
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Apr 23, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@rg20 rg20 requested review from chris-maes and removed request for kaatish and nguidotti April 23, 2026 21:21
@rg20 rg20 added non-breaking Introduces a non-breaking change improvement Improves an existing functionality labels Apr 23, 2026
@rg20 rg20 added this to the 26.06 milestone Apr 23, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 23, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds native free-variable support for QP barrier solves, adjusts related barrier math and control flow, records free-variable indices in presolve metadata, conditions dense-column detection on Q presence/diagonality, times cuDSS initialization, removes one pre-barrier diagnostic, and introduces a ratio-based GMRES stop/update rule.

Changes

Cohort / File(s) Summary
Barrier: native free-variable support
cpp/src/barrier/barrier.cu
Adds n_free_vars and GPU flag vector d_is_free_ in iteration_data_t, initializes from presolve_info.free_variable_indices. Forces z=0 for native free vars during initial point, saves/restores x around ensure_positive. Excludes free vars from ratio-test (gpu_max_step_to_boundary), modifies search-direction formation (gpu_compute_search_direction) to skip complementarity terms for free vars and set dz=0, adjusts complementarity denominators and RHS handling when updating mu/compute_cc_rhs. Adds NVTX around augmented solve and check_for_suboptimal_solution.
Presolve: free-variable metadata & behavior
cpp/src/dual_simplex/presolve.hpp, cpp/src/dual_simplex/presolve.cpp
Adds std::vector<i_t> free_variable_indices to presolve_info_t. For problems with Q.n > 0 (QPs), presolve no longer splits free variables into v/w; instead records their indices and leaves them unchanged. For non-QP cases, existing split behavior remains.
Barrier: dense-column detection conditional
cpp/src/barrier/barrier.cu
Dense-column detection (find_dense_columns) is only run when `!has_Q
Sparse Cholesky / cuDSS timing
cpp/src/barrier/sparse_cholesky.cuh
Adds a timestamp immediately before cudssCreateMg and logs elapsed initialization time after cudssSetStream; call sequence/params unchanged.
Removed pre-barrier diagnostic scan
cpp/src/dual_simplex/solve.cpp
Removes conditional pre-barrier scan and logging over presolve_info.free_variable_pairs for QPs that computed quadratic row spans and logged when both variables lacked quadratic terms.
Iterative refinement: ratio-based stopping
cpp/src/barrier/iterative_refinement.hpp
GMRES iterative refinement uses improvement_ratio = best_residual / residual; updates saved best and best_residual only when improvement > stop_ratio. If improvement ∈ (0, stop_ratio), logs (when enabled), saves current best, and breaks early; otherwise treats stagnation/increase as before and restores prior best.
Other small instrumentation/formatting
...
Minor NVTX additions and whitespace/formatting edits accompanying the above changes.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The description is entirely boilerplate containing only contributor guidelines checklist items without any meaningful detail about the actual changes, their rationale, or implementation. Replace the boilerplate with a substantive description explaining what free variable handling was implemented, why it reduces QP overheads, and how the augmented system approach works.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Reduce QP overheads' clearly and concisely describes the main objective of the changeset, which focuses on optimizing quadratic program performance through native free variable handling and dense-column detection improvements.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

// Solve using barrier
lp_solution_t<i_t, f_t> barrier_solution(barrier_lp.num_rows, barrier_lp.num_cols);

// Clear variable pairs for QP
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I ask why we remove the check here?

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
cpp/src/barrier/iterative_refinement.hpp (1)

181-182: Make the early-stop ratio configurable.

Line 181 hard-codes a solver-wide accuracy/performance tradeoff into a shared iterative-refinement path. A fixed 5.0 may be fine for the profiling case that motivated this PR, but it is likely too aggressive for some matrices and too loose for others. Please plumb this through existing solver settings or a caller-supplied parameter instead of baking it into the header.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/barrier/iterative_refinement.hpp` around lines 181 - 182, Replace the
hard-coded local constant f_t stop_ratio = 5.0 in iterative_refinement.hpp with
a configurable parameter sourced from the solver settings or a caller-provided
argument: add a new float-typed setting (or function parameter) named stop_ratio
(or iterative_refinement_stop_ratio) to the solver config or to the iterative
refinement function/method, use that variable in place of the literal when
computing bnorm and the early-stop decision, keep the default value at 5.0 if
not supplied, and update any callers or constructor that invoke the iterative
refinement path to thread the new setting through.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/src/barrier/iterative_refinement.hpp`:
- Around line 365-380: The branch treating improvement_ratio == stop_ratio as
non-improvement is wrong; update the comparison logic in the iterative
refinement block (symbols: improvement_ratio, stop_ratio, best_residual, x_sav,
x, show_info, CUOPT_LOG_INFO) so that hitting the threshold counts as
improvement (e.g., use >= instead of >) or use an epsilon-based comparison for
floating-point stability (|improvement_ratio - stop_ratio| <= eps) to decide
improvement, and ensure the residual update and raft::copy for
best_residual/x_sav->x remain inside the "improved" branch while keeping the
logging/break behavior only for true stagnation/worsening cases.

---

Nitpick comments:
In `@cpp/src/barrier/iterative_refinement.hpp`:
- Around line 181-182: Replace the hard-coded local constant f_t stop_ratio =
5.0 in iterative_refinement.hpp with a configurable parameter sourced from the
solver settings or a caller-provided argument: add a new float-typed setting (or
function parameter) named stop_ratio (or iterative_refinement_stop_ratio) to the
solver config or to the iterative refinement function/method, use that variable
in place of the literal when computing bnorm and the early-stop decision, keep
the default value at 5.0 if not supplied, and update any callers or constructor
that invoke the iterative refinement path to thread the new setting through.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a7b6ab20-ce6f-49c8-9326-46dd180e9846

📥 Commits

Reviewing files that changed from the base of the PR and between cd41420 and 020312b.

📒 Files selected for processing (1)
  • cpp/src/barrier/iterative_refinement.hpp

Comment on lines +365 to 380
f_t improvement_ratio = best_residual / residual;
// Track best solution
if (residual < best_residual) {
if (improvement_ratio > stop_ratio) {
best_residual = residual;
raft::copy(x_sav.data(), x.data(), x.size(), x.stream());
} else if (improvement_ratio < stop_ratio && improvement_ratio > 1.0) {
best_residual = residual;
raft::copy(x_sav.data(), x.data(), x.size(), x.stream());
// Residual decreased, but not enough, continue
if (show_info) {
CUOPT_LOG_INFO("GMRES IR: improvement ratio %e is less than %e, breaking early",
improvement_ratio,
stop_ratio);
}
break;
} else {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Treat improvement_ratio == stop_ratio as improvement, not stagnation.

With the current > / < split, an exact threshold hit falls into the else branch, restores x_sav, and logs it as "residual increased or stagnated" even though the residual improved. That can discard a better iterate at the boundary. Use >= here at minimum, or an epsilon-based comparison if you want this threshold to be numerically stable.

🔧 Minimal fix
-    if (improvement_ratio > stop_ratio) {
+    if (improvement_ratio >= stop_ratio) {
       best_residual = residual;
       raft::copy(x_sav.data(), x.data(), x.size(), x.stream());
-    } else if (improvement_ratio < stop_ratio && improvement_ratio > 1.0) {
+    } else if (improvement_ratio > 1.0) {
       best_residual = residual;
       raft::copy(x_sav.data(), x.data(), x.size(), x.stream());

As per coding guidelines, check numerical stability: prevent overflow/underflow, precision loss, division by zero/near-zero, and use epsilon comparisons for floating-point equality checks.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
f_t improvement_ratio = best_residual / residual;
// Track best solution
if (residual < best_residual) {
if (improvement_ratio > stop_ratio) {
best_residual = residual;
raft::copy(x_sav.data(), x.data(), x.size(), x.stream());
} else if (improvement_ratio < stop_ratio && improvement_ratio > 1.0) {
best_residual = residual;
raft::copy(x_sav.data(), x.data(), x.size(), x.stream());
// Residual decreased, but not enough, continue
if (show_info) {
CUOPT_LOG_INFO("GMRES IR: improvement ratio %e is less than %e, breaking early",
improvement_ratio,
stop_ratio);
}
break;
} else {
f_t improvement_ratio = best_residual / residual;
// Track best solution
if (improvement_ratio >= stop_ratio) {
best_residual = residual;
raft::copy(x_sav.data(), x.data(), x.size(), x.stream());
} else if (improvement_ratio > 1.0) {
best_residual = residual;
raft::copy(x_sav.data(), x.data(), x.size(), x.stream());
// Residual decreased, but not enough, continue
if (show_info) {
CUOPT_LOG_INFO("GMRES IR: improvement ratio %e is less than %e, breaking early",
improvement_ratio,
stop_ratio);
}
break;
} else {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/barrier/iterative_refinement.hpp` around lines 365 - 380, The branch
treating improvement_ratio == stop_ratio as non-improvement is wrong; update the
comparison logic in the iterative refinement block (symbols: improvement_ratio,
stop_ratio, best_residual, x_sav, x, show_info, CUOPT_LOG_INFO) so that hitting
the threshold counts as improvement (e.g., use >= instead of >) or use an
epsilon-based comparison for floating-point stability (|improvement_ratio -
stop_ratio| <= eps) to decide improvement, and ensure the residual update and
raft::copy for best_residual/x_sav->x remain inside the "improved" branch while
keeping the logging/break behavior only for true stagnation/worsening cases.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cpp/src/barrier/barrier.cu`:
- Around line 2310-2340: The free-variable branch currently sets diag to 0
whenever q_jj > 0 which allows tiny positive Q(j,j) to produce near-zero diag;
change the lambda used in the cub::DeviceTransform::Transform (the one capturing
free_var_reg in the data.Q.n > 0 && data.Q_diagonal case) to floor the returned
value at free_var_reg (e.g., return max(q_jj, free_var_reg) or return (q_jj >
free_var_reg) ? q_jj : free_var_reg) so diag for free variables is never below
free_var_reg; also ensure the other free-variable-only lambda (the data.Q.n == 0
path) similarly returns at least free_var_reg for is_free entries.
- Around line 3596-3606: The initial mu computation (the one that divides by n +
num_upper_bounds before the iteration loop) doesn't account for native free
variables set via presolve_info.free_variable_indices and data.n_free_vars;
update that initial mu calculation to use the same adjusted divisor as
compute_mu()/compute_target_mu() by subtracting data.n_free_vars (i.e., use n +
num_upper_bounds - data.n_free_vars) when presolve_info.free_variable_indices is
non-empty, and ensure data.n_free_vars and d_is_free_ are initialized before
this initial mu is computed so the first iteration uses the same barrier measure
as later compute_mu()/compute_target_mu() calls.
- Around line 3081-3083: The calculation of mu_aff and subsequent mu division
must guard against mu_denom == 0 (mu_denom = x.size() + n_upper_bounds -
n_free_vars) to avoid division-by-zero when all variables are free; add an
explicit fast-path: compute mu_denom as now, then if mu_denom <= eps (use a
small f_t epsilon like 0 or machine epsilon), set mu_aff = static_cast<f_t>(0)
and mu = static_cast<f_t>(0), set any mu ratio (e.g., mu_aff_over_mu) to a safe
default (1 or 0 depending on downstream expectations) and skip the
mu-aff/mu-based updates, otherwise perform the existing divisions. Apply the
identical guard and handling for the other occurrence around the mu computation
block (the second spot noted in the comment).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 94fd7d1f-a999-4dab-a9dc-d5d097ed0af6

📥 Commits

Reviewing files that changed from the base of the PR and between 104b82c and 0002029.

📒 Files selected for processing (3)
  • cpp/src/barrier/barrier.cu
  • cpp/src/dual_simplex/presolve.cpp
  • cpp/src/dual_simplex/presolve.hpp

Comment thread cpp/src/barrier/barrier.cu
Comment on lines +3081 to +3083
f_t mu_denom = static_cast<f_t>(data.x.size()) + static_cast<f_t>(data.n_upper_bounds);
mu_denom -= static_cast<f_t>(data.n_free_vars);
mu_aff = complementarity_aff_sum / mu_denom;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Handle the zero-complementarity case before computing mu.

With native free-variable QPs, x.size() + n_upper_bounds - n_free_vars can be zero when every variable is free and there are no finite upper bounds. In that case both mu_aff and mu divide by zero, and the subsequent mu_aff / mu ratio is undefined as well. This needs a dedicated fast path rather than just subtracting n_free_vars.

💡 Suggested guard
-  f_t mu_denom            = static_cast<f_t>(data.x.size()) + static_cast<f_t>(data.n_upper_bounds);
-  mu_denom -= static_cast<f_t>(data.n_free_vars);
-  mu_aff = complementarity_aff_sum / mu_denom;
+  f_t mu_denom = static_cast<f_t>(data.x.size()) + static_cast<f_t>(data.n_upper_bounds) -
+                 static_cast<f_t>(data.n_free_vars);
+  if (mu_denom == f_t(0)) {
+    mu_aff = f_t(0);
+    sigma  = f_t(0);
+    new_mu = f_t(0);
+    return;
+  }
+  mu_aff = complementarity_aff_sum / mu_denom;
-  f_t mu_denom = static_cast<f_t>(data.x.size()) + static_cast<f_t>(data.n_upper_bounds);
-  mu_denom -= static_cast<f_t>(data.n_free_vars);  // free vars don't contribute to mu
+  f_t mu_denom = static_cast<f_t>(data.x.size()) + static_cast<f_t>(data.n_upper_bounds) -
+                 static_cast<f_t>(data.n_free_vars);
+  if (mu_denom == f_t(0)) {
+    mu = f_t(0);
+    return;
+  }

As per coding guidelines, "Check numerical stability: prevent overflow/underflow, precision loss, division by zero/near-zero, and use epsilon comparisons for floating-point equality checks."

Also applies to: 3307-3316

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/barrier/barrier.cu` around lines 3081 - 3083, The calculation of
mu_aff and subsequent mu division must guard against mu_denom == 0 (mu_denom =
x.size() + n_upper_bounds - n_free_vars) to avoid division-by-zero when all
variables are free; add an explicit fast-path: compute mu_denom as now, then if
mu_denom <= eps (use a small f_t epsilon like 0 or machine epsilon), set mu_aff
= static_cast<f_t>(0) and mu = static_cast<f_t>(0), set any mu ratio (e.g.,
mu_aff_over_mu) to a safe default (1 or 0 depending on downstream expectations)
and skip the mu-aff/mu-based updates, otherwise perform the existing divisions.
Apply the identical guard and handling for the other occurrence around the mu
computation block (the second spot noted in the comment).

Comment thread cpp/src/barrier/barrier.cu
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants