Skip to content

Add support for sparsity exploitation for SOCs of large dimensionality#1468

Open
yuwenchen95 wants to merge 1 commit into
NVIDIA:mainfrom
yuwenchen95:sparse-socp
Open

Add support for sparsity exploitation for SOCs of large dimensionality#1468
yuwenchen95 wants to merge 1 commit into
NVIDIA:mainfrom
yuwenchen95:sparse-socp

Conversation

@yuwenchen95

Copy link
Copy Markdown
Contributor

Description

Adds sparsity exploitation for second-order cone programs (SOCP) that contain
large-dimensional cones, so the barrier solver no longer materializes a dense
Hessian block for every cone.

  • Hybrid dense/sparse cone handling. Cones with dimension smaller than a certain threshold
    keep the existing explicit dense Hessian block; larger cones now use a sparse
    rank-2 (Nesterov-Todd) representation in the augmented KKT system, with an
    expansion block whose RHS rows are zero-filled.
  • New SOCP device kernels for sparse NT-scaling, sparse Hessian diagonal
    (Hs), and the sparse augmented matvec, plus assembly of the sparse expansion
    into the augmented system.
  • Segmented-sum reduction over cone vectors that distinguishes small/medium/large
    cones, with an ILP path (multiple accumulators per thread) for large "medium" cones.

Issue

Checklist

  • I am familiar with the Contributing Guidelines.
  • Testing
    • New or existing tests cover these changes
    • Added tests
    • Created an issue to follow-up
    • NA
  • Documentation
    • The documentation is up to date with these changes
    • Added new documentation
    • NA

@yuwenchen95 yuwenchen95 requested review from a team as code owners June 26, 2026 12:03
@yuwenchen95 yuwenchen95 requested a review from a team as a code owner June 26, 2026 12:03
@copy-pr-bot

copy-pr-bot Bot commented Jun 26, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: yuwenchen95 <yuwchen@nvidia.com>
@yuwenchen95 yuwenchen95 added feature request New feature or request non-breaking Introduces a non-breaking change and removed Feature labels Jun 26, 2026
@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Adds a configurable SOC threshold, extends barrier and cone-data structures for sparse second-order cones, updates augmented-system construction and matvec paths, replaces medium-cone reduction, and adds SOCP and sparse augmented KKT tests.

Changes

Sparse SOC expansion

Layer / File(s) Summary
Threshold plumbing
cpp/include/cuopt/linear_programming/constants.h, cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp, cpp/src/dual_simplex/simplex_solver_settings.hpp, cpp/src/math_optimization/solver_settings.cu, cpp/src/pdlp/solve.cu, cpp/src/barrier/barrier.cu
CUOPT_BARRIER_SOC_THRESHOLD is added to public settings, registered as an integer parameter, and assigned into barrier solver and cone construction.
Cone metadata and sparse scaling
cpp/src/barrier/second_order_cone_kernels.cuh
cone_data_t now classifies SOC blocks by threshold, stores sparse-cone metadata and expansion buffers, and refreshes sparse rank-2 scaling factors.
Medium-cone reduction and Hessian API
cpp/src/barrier/second_order_cone_reduction.cuh, cpp/src/barrier/second_order_cone_kernels.cuh
Medium SOC reductions switch to a block-per-cone kernel, and the Hessian apply path can skip sparse cones when dense-only matvecs are requested.
Sparse augmented-system helpers
cpp/src/barrier/second_order_cone_kernels.cuh
Sparse SOC Hessian data feeds helpers that extract Hs, scatter diagonal entries, write expansion couplings, and launch sparse augmented matvecs.
Augmented assembly and scatter
cpp/src/barrier/barrier.cu
The barrier augmented system now computes expansion-aware sizes, builds separate dense and sparse SOC CSR data, uploads new device buffers, and updates the sparse and dense scatter path.
Augmented solve path
cpp/src/barrier/barrier.cu, cpp/src/barrier/iterative_refinement.hpp
The solve path sizes RHS and solution buffers from expansion-aware counts, refreshes sparse scaling, handles the expansion segment in matvec and solve code, and centralizes saved-iterate restore.
Kernel tests
cpp/tests/socp/CMakeLists.txt, cpp/tests/socp/second_order_cone_kernels.cu, cpp/tests/socp/sparse_augmented_kkt_test.cu
SOC kernel tests cover dense-only accumulation, sparse cone classification and scaling updates, and sparse augmented KKT helpers.
Barrier solve tests
cpp/tests/socp/solve_barrier_socp.cu
Barrier solve tests cover a single sparse SOC, mixed dense/sparse blocks, and a 500-variable sparse expansion case.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested labels

improvement

Suggested reviewers

  • rgsl888prabhu
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 7.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: adding sparsity exploitation for large SOCs.
Description check ✅ Passed The description is directly related to the SOCP sparsity and solver changes in the pull request.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (5)
cpp/tests/socp/second_order_cone_kernels.cu (2)

753-775: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Add a dim == soc_threshold cone to lock the branch boundary.

The constructor contract is dim > soc_threshold for sparse cones, so an off-by-one regression would flip expansion counts and dense-vs-sparse routing without failing this test. Including a 5D cone beside the 6D case would pin that boundary down. As per path instructions, the review focus here is correctness of SOC dense-vs-sparse branching and index mapping.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/tests/socp/second_order_cone_kernels.cu` around lines 753 - 775, Update
the sparse_cone_classification test in second_order_cone_kernels to include a
cone with dimension equal to soc_threshold so the boundary case is covered
alongside the existing 3, 6, 2, and 32769 cones. Adjust the expected
dense/sparse counts, expansion_var_count, sparse entries, and sparse cone
ids/dims assertions in cone_data_t to reflect that dim == soc_threshold stays
dense while dim > soc_threshold remains sparse. This should lock the branch
behavior in cone_data_t and prevent off-by-one regressions in the
dense-vs-sparse routing and index mapping.

Source: Path instructions


310-368: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Assert cone feasibility after the returned step.

This only proves alpha lands in (0, 1]. A broken medium-cone path can still satisfy that while moving the iterate outside the cone. Reusing the per-cone feasibility check from cone_step_length_keeps_iterate_in_cone here would validate the new 1000D mixed-topology path instead of mostly smoke-testing it. As per path instructions, tests under cpp/tests/** should prioritize numerical correctness validation, not just “runs”, and include edge cases that hit reduction correctness.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/tests/socp/second_order_cone_kernels.cu` around lines 310 - 368, The new
mixed-topology step-length test only checks that compute_cone_step_length
returns alpha values in range, but it does not verify the updated iterate stays
feasible. Extend cone_step_length_many_small_one_sparse_medium_cone by reusing
the per-cone feasibility validation used in
cone_step_length_keeps_iterate_in_cone after applying the returned step to
dx/dz, so the cone_data_t path for the 1000D cone is checked for actual cone
membership rather than just bounds on step length.

Source: Path instructions

cpp/tests/socp/solve_barrier_socp.cu (1)

869-875: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

These cases don’t currently prove the threshold plumbing works.

Q^6 with threshold 4 and Q^500 with threshold 5 are also sparse under the default threshold, so a regression that ignores settings.barrier_soc_threshold would still pass all three tests. Please add one boundary case solved under two thresholds (or a Q^5/Q^6 pair) so the parameter is exercised end-to-end. As per path instructions, the review focus for this PR is correctness of SOC dense-vs-sparse branching, and the stack context says this PR adds the barrier_soc_threshold solver parameter.

Also applies to: 948-955, 1018-1023

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/tests/socp/solve_barrier_socp.cu` around lines 869 - 875, The SOC
threshold tests currently do not exercise `settings.barrier_soc_threshold`
end-to-end because the chosen `Q^6` and `Q^500` cases remain sparse under the
default behavior, so a bug that ignores the setting could still pass. Update the
`solve_barrier_socp` test coverage by adding a boundary case in
`test_barrier_socp` (or the relevant nearby test helpers) that is solved twice
with different `barrier_soc_threshold` values, or use a `Q^5`/`Q^6` pair, so the
dense-vs-sparse branching is actually validated through the
`simplex_solver_settings_t` plumbing.

Source: Path instructions

cpp/tests/socp/sparse_augmented_kkt_test.cu (1)

88-177: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Exercise the packed-offset path with at least two sparse cones.

Both integration tests only cover a single sparse cone, so bugs in sparse_entry_offsets advancement or second-cone expansion-slot mapping would still pass. Adding a two-sparse-cone case here would verify the cross-cone index math the production path relies on. As per path instructions, tests in cpp/tests/** should prioritize correctness of SOC index mapping and edge cases that can hit device-memory safety or reduction bugs.

Also applies to: 179-256

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/tests/socp/sparse_augmented_kkt_test.cu` around lines 88 - 177, The
sparse augmented KKT test only covers one sparse cone, so it does not exercise
packed-offset advancement or second-cone expansion-slot mapping. Extend
scatter_and_update_sparse_expansion to build at least two sparse cones via
cone_data_t, then validate that sparse_entry_offsets, sparse_cone_ids, and the
expansion updates in update_sparse_expansion_in_augmented and
scatter_sparse_hessian_diag_into_augmented are correct for both cones. Keep the
existing single-cone assertions, but add per-cone expectations that confirm the
second cone writes to the right augmented_x slots and does not reuse the first
cone’s offsets.

Source: Path instructions

cpp/src/barrier/barrier.cu (1)

842-842: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Misleading comment in the sparse-cone branch.

Line 842's comment reads "Handle off-diagonal entries of dense SOC before diagonal entry", but this is the sparse-cone row path (cone_sparse_idx[k] >= 0). The loops here process the remaining intra/post-cone Q off-diagonals for the sparse cone. Suggest dropping the "dense" wording to avoid confusion for future maintainers.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/src/barrier/barrier.cu` at line 842, The comment in the sparse-cone
branch is misleading because the code path under cone_sparse_idx[k] processes
the sparse cone row, not the dense SOC path. Update the comment near the
off-diagonal handling in barrier.cu to remove the “dense SOC” wording and
describe that these loops handle the remaining intra/post-cone Q off-diagonals
for the sparse cone, keeping the comment aligned with the surrounding logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cpp/src/barrier/second_order_cone_kernels.cuh`:
- Around line 237-247: The constructor initializer list in the
second_order_cone_kernels class is out of sync with the member declaration
order, which can trigger -Werror build failures. Update the constructor so
soc_threshold, n_sparse_cones, and n_sparse_cone_entries are initialized in the
same order as their declarations, or reorder the member declarations to match;
use the constructor and the affected member fields in second_order_cone_kernels
to locate the fix.

In `@cpp/src/barrier/second_order_cone_reduction.cuh`:
- Around line 277-284: The cone reduction in second_order_cone_reduction.cuh is
adding init on every thread before block_reduce_t::Sum, which makes the final
result include init multiplied by block_dim instead of once. Update the
reduction flow in the block path so that only the accumulated acc[k] values are
reduced first, then apply init a single time after the block reduction in the
operator() / reduction helper that computes the cone sum. Keep the same logic
consistent with the warp path and verify the convenience overload still works
for f_t{0}.

---

Nitpick comments:
In `@cpp/src/barrier/barrier.cu`:
- Line 842: The comment in the sparse-cone branch is misleading because the code
path under cone_sparse_idx[k] processes the sparse cone row, not the dense SOC
path. Update the comment near the off-diagonal handling in barrier.cu to remove
the “dense SOC” wording and describe that these loops handle the remaining
intra/post-cone Q off-diagonals for the sparse cone, keeping the comment aligned
with the surrounding logic.

In `@cpp/tests/socp/second_order_cone_kernels.cu`:
- Around line 753-775: Update the sparse_cone_classification test in
second_order_cone_kernels to include a cone with dimension equal to
soc_threshold so the boundary case is covered alongside the existing 3, 6, 2,
and 32769 cones. Adjust the expected dense/sparse counts, expansion_var_count,
sparse entries, and sparse cone ids/dims assertions in cone_data_t to reflect
that dim == soc_threshold stays dense while dim > soc_threshold remains sparse.
This should lock the branch behavior in cone_data_t and prevent off-by-one
regressions in the dense-vs-sparse routing and index mapping.
- Around line 310-368: The new mixed-topology step-length test only checks that
compute_cone_step_length returns alpha values in range, but it does not verify
the updated iterate stays feasible. Extend
cone_step_length_many_small_one_sparse_medium_cone by reusing the per-cone
feasibility validation used in cone_step_length_keeps_iterate_in_cone after
applying the returned step to dx/dz, so the cone_data_t path for the 1000D cone
is checked for actual cone membership rather than just bounds on step length.

In `@cpp/tests/socp/solve_barrier_socp.cu`:
- Around line 869-875: The SOC threshold tests currently do not exercise
`settings.barrier_soc_threshold` end-to-end because the chosen `Q^6` and `Q^500`
cases remain sparse under the default behavior, so a bug that ignores the
setting could still pass. Update the `solve_barrier_socp` test coverage by
adding a boundary case in `test_barrier_socp` (or the relevant nearby test
helpers) that is solved twice with different `barrier_soc_threshold` values, or
use a `Q^5`/`Q^6` pair, so the dense-vs-sparse branching is actually validated
through the `simplex_solver_settings_t` plumbing.

In `@cpp/tests/socp/sparse_augmented_kkt_test.cu`:
- Around line 88-177: The sparse augmented KKT test only covers one sparse cone,
so it does not exercise packed-offset advancement or second-cone expansion-slot
mapping. Extend scatter_and_update_sparse_expansion to build at least two sparse
cones via cone_data_t, then validate that sparse_entry_offsets, sparse_cone_ids,
and the expansion updates in update_sparse_expansion_in_augmented and
scatter_sparse_hessian_diag_into_augmented are correct for both cones. Keep the
existing single-cone assertions, but add per-cone expectations that confirm the
second cone writes to the right augmented_x slots and does not reuse the first
cone’s offsets.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e135c5c4-8f4c-4b4d-853d-46b8d7866a5b

📥 Commits

Reviewing files that changed from the base of the PR and between 806146e and 01f3469.

📒 Files selected for processing (13)
  • cpp/include/cuopt/linear_programming/constants.h
  • cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp
  • cpp/src/barrier/barrier.cu
  • cpp/src/barrier/iterative_refinement.hpp
  • cpp/src/barrier/second_order_cone_kernels.cuh
  • cpp/src/barrier/second_order_cone_reduction.cuh
  • cpp/src/dual_simplex/simplex_solver_settings.hpp
  • cpp/src/math_optimization/solver_settings.cu
  • cpp/src/pdlp/solve.cu
  • cpp/tests/socp/CMakeLists.txt
  • cpp/tests/socp/second_order_cone_kernels.cu
  • cpp/tests/socp/solve_barrier_socp.cu
  • cpp/tests/socp/sparse_augmented_kkt_test.cu
💤 Files with no reviewable changes (1)
  • cpp/src/barrier/iterative_refinement.hpp

Comment on lines +237 to +247
sparse_cone_ids(0, stream),
sparse_cone_dims(0, stream),
d(0, stream),
sparse_v(0, stream),
sparse_u(0, stream),
sparse_entry_offsets(0, stream),
cone_is_sparse(n_cones, stream),
scratch(n_cones, n_cone_entries, stream),
soc_threshold(soc_threshold_in),
n_sparse_cones(0),
n_sparse_cone_entries(0)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Look for -Werror / -Wreorder handling in the C++/CUDA build config
rg -nP -- '-W(error|reorder|all)\b|CMAKE_(CXX|CUDA)_FLAGS|add_compile_options' \
  --glob 'CMakeLists.txt' --glob '*.cmake' cpp 2>/dev/null

Repository: NVIDIA/cuopt

Length of output: 1235


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect the build flags more broadly and search for reorder-specific handling.
echo "=== cpp/CMakeLists.txt ==="
sed -n '60,210p' cpp/CMakeLists.txt

echo
echo "=== search for Wreorder ==="
rg -n --hidden -S 'Wreorder|reorder' cpp .github . -g '!*build*' -g '!*.pyc'

Repository: NVIDIA/cuopt

Length of output: 13010


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect the build flags more broadly and search for reorder-specific handling.
echo "=== cpp/CMakeLists.txt ==="
sed -n '60,210p' cpp/CMakeLists.txt

echo
echo "=== search for Wreorder ==="
rg -n --hidden -S 'Wreorder|reorder' cpp .github . -g '!*build*' -g '!*.pyc'

Repository: NVIDIA/cuopt

Length of output: 13010


Reorder the constructor initializer list to match the member declaration order. soc_threshold, n_sparse_cones, and n_sparse_cone_entries are declared before sparse_cone_ids/scratch, but initialized after them here. With -Werror enabled in the CMake flags, this can fail the build; move those initializers up or reorder the members.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/src/barrier/second_order_cone_kernels.cuh` around lines 237 - 247, The
constructor initializer list in the second_order_cone_kernels class is out of
sync with the member declaration order, which can trigger -Werror build
failures. Update the constructor so soc_threshold, n_sparse_cones, and
n_sparse_cone_entries are initialized in the same order as their declarations,
or reorder the member declarations to match; use the constructor and the
affected member fields in second_order_cone_kernels to locate the fix.

Comment on lines +277 to +284
value_t sum = init;
#pragma unroll
for (int k = 0; k < items_per_thread; ++k) {
sum = sum + acc[k];
}

sum = block_reduce_t(temp_storage).Sum(sum);
if (threadIdx.x == 0) { output[cone] = sum; }

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

init is accumulated block_dim times instead of once.

Every thread seeds sum = init, and block_reduce_t::Sum then sums across all block_dim threads, so the result becomes Σ entries + block_dim * init (256×) rather than Σ entries + init. The convenience overload at Line 157 currently passes f_t{0}, so this is latent today, but the public operator() at Line 114 accepts an arbitrary init, so any non-zero caller would get a wrong cone sum. (The warp kernel has the same shape but only multiplies by 32, so the two paths are also inconsistent.)

Apply init once, after the block reduction.

🐛 Proposed fix
-  value_t sum = init;
+  value_t sum = value_t{0};
 `#pragma` unroll
   for (int k = 0; k < items_per_thread; ++k) {
     sum = sum + acc[k];
   }
 
   sum = block_reduce_t(temp_storage).Sum(sum);
-  if (threadIdx.x == 0) { output[cone] = sum; }
+  if (threadIdx.x == 0) { output[cone] = init + sum; }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
value_t sum = init;
#pragma unroll
for (int k = 0; k < items_per_thread; ++k) {
sum = sum + acc[k];
}
sum = block_reduce_t(temp_storage).Sum(sum);
if (threadIdx.x == 0) { output[cone] = sum; }
value_t sum = value_t{0};
`#pragma` unroll
for (int k = 0; k < items_per_thread; ++k) {
sum = sum + acc[k];
}
sum = block_reduce_t(temp_storage).Sum(sum);
if (threadIdx.x == 0) { output[cone] = init + sum; }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/src/barrier/second_order_cone_reduction.cuh` around lines 277 - 284, The
cone reduction in second_order_cone_reduction.cuh is adding init on every thread
before block_reduce_t::Sum, which makes the final result include init multiplied
by block_dim instead of once. Update the reduction flow in the block path so
that only the accumulated acc[k] values are reduced first, then apply init a
single time after the block reduction in the operator() / reduction helper that
computes the cone sum. Keep the same logic consistent with the warp path and
verify the convenience overload still works for f_t{0}.

@akifcorduk akifcorduk requested review from mlubin and rg20 and removed request for Kh4ster and akifcorduk June 26, 2026 12:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

barrier feature request New feature or request non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant