Add support for sparsity exploitation for SOCs of large dimensionality#1468
Add support for sparsity exploitation for SOCs of large dimensionality#1468yuwenchen95 wants to merge 1 commit into
Conversation
Signed-off-by: yuwenchen95 <yuwchen@nvidia.com>
📝 WalkthroughWalkthroughAdds a configurable SOC threshold, extends barrier and cone-data structures for sparse second-order cones, updates augmented-system construction and matvec paths, replaces medium-cone reduction, and adds SOCP and sparse augmented KKT tests. ChangesSparse SOC expansion
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (5)
cpp/tests/socp/second_order_cone_kernels.cu (2)
753-775: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick winAdd a
dim == soc_thresholdcone to lock the branch boundary.The constructor contract is
dim > soc_thresholdfor sparse cones, so an off-by-one regression would flip expansion counts and dense-vs-sparse routing without failing this test. Including a 5D cone beside the 6D case would pin that boundary down. As per path instructions, the review focus here is correctness of SOC dense-vs-sparse branching and index mapping.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/tests/socp/second_order_cone_kernels.cu` around lines 753 - 775, Update the sparse_cone_classification test in second_order_cone_kernels to include a cone with dimension equal to soc_threshold so the boundary case is covered alongside the existing 3, 6, 2, and 32769 cones. Adjust the expected dense/sparse counts, expansion_var_count, sparse entries, and sparse cone ids/dims assertions in cone_data_t to reflect that dim == soc_threshold stays dense while dim > soc_threshold remains sparse. This should lock the branch behavior in cone_data_t and prevent off-by-one regressions in the dense-vs-sparse routing and index mapping.Source: Path instructions
310-368: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick winAssert cone feasibility after the returned step.
This only proves
alphalands in(0, 1]. A broken medium-cone path can still satisfy that while moving the iterate outside the cone. Reusing the per-cone feasibility check fromcone_step_length_keeps_iterate_in_conehere would validate the new 1000D mixed-topology path instead of mostly smoke-testing it. As per path instructions, tests undercpp/tests/**should prioritize numerical correctness validation, not just “runs”, and include edge cases that hit reduction correctness.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/tests/socp/second_order_cone_kernels.cu` around lines 310 - 368, The new mixed-topology step-length test only checks that compute_cone_step_length returns alpha values in range, but it does not verify the updated iterate stays feasible. Extend cone_step_length_many_small_one_sparse_medium_cone by reusing the per-cone feasibility validation used in cone_step_length_keeps_iterate_in_cone after applying the returned step to dx/dz, so the cone_data_t path for the 1000D cone is checked for actual cone membership rather than just bounds on step length.Source: Path instructions
cpp/tests/socp/solve_barrier_socp.cu (1)
869-875: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick winThese cases don’t currently prove the threshold plumbing works.
Q^6with threshold4andQ^500with threshold5are also sparse under the default threshold, so a regression that ignoressettings.barrier_soc_thresholdwould still pass all three tests. Please add one boundary case solved under two thresholds (or aQ^5/Q^6pair) so the parameter is exercised end-to-end. As per path instructions, the review focus for this PR is correctness of SOC dense-vs-sparse branching, and the stack context says this PR adds thebarrier_soc_thresholdsolver parameter.Also applies to: 948-955, 1018-1023
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/tests/socp/solve_barrier_socp.cu` around lines 869 - 875, The SOC threshold tests currently do not exercise `settings.barrier_soc_threshold` end-to-end because the chosen `Q^6` and `Q^500` cases remain sparse under the default behavior, so a bug that ignores the setting could still pass. Update the `solve_barrier_socp` test coverage by adding a boundary case in `test_barrier_socp` (or the relevant nearby test helpers) that is solved twice with different `barrier_soc_threshold` values, or use a `Q^5`/`Q^6` pair, so the dense-vs-sparse branching is actually validated through the `simplex_solver_settings_t` plumbing.Source: Path instructions
cpp/tests/socp/sparse_augmented_kkt_test.cu (1)
88-177: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick winExercise the packed-offset path with at least two sparse cones.
Both integration tests only cover a single sparse cone, so bugs in
sparse_entry_offsetsadvancement or second-cone expansion-slot mapping would still pass. Adding a two-sparse-cone case here would verify the cross-cone index math the production path relies on. As per path instructions, tests incpp/tests/**should prioritize correctness of SOC index mapping and edge cases that can hit device-memory safety or reduction bugs.Also applies to: 179-256
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/tests/socp/sparse_augmented_kkt_test.cu` around lines 88 - 177, The sparse augmented KKT test only covers one sparse cone, so it does not exercise packed-offset advancement or second-cone expansion-slot mapping. Extend scatter_and_update_sparse_expansion to build at least two sparse cones via cone_data_t, then validate that sparse_entry_offsets, sparse_cone_ids, and the expansion updates in update_sparse_expansion_in_augmented and scatter_sparse_hessian_diag_into_augmented are correct for both cones. Keep the existing single-cone assertions, but add per-cone expectations that confirm the second cone writes to the right augmented_x slots and does not reuse the first cone’s offsets.Source: Path instructions
cpp/src/barrier/barrier.cu (1)
842-842: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueMisleading comment in the sparse-cone branch.
Line 842's comment reads "Handle off-diagonal entries of dense SOC before diagonal entry", but this is the sparse-cone row path (
cone_sparse_idx[k] >= 0). The loops here process the remaining intra/post-coneQoff-diagonals for the sparse cone. Suggest dropping the "dense" wording to avoid confusion for future maintainers.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/src/barrier/barrier.cu` at line 842, The comment in the sparse-cone branch is misleading because the code path under cone_sparse_idx[k] processes the sparse cone row, not the dense SOC path. Update the comment near the off-diagonal handling in barrier.cu to remove the “dense SOC” wording and describe that these loops handle the remaining intra/post-cone Q off-diagonals for the sparse cone, keeping the comment aligned with the surrounding logic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cpp/src/barrier/second_order_cone_kernels.cuh`:
- Around line 237-247: The constructor initializer list in the
second_order_cone_kernels class is out of sync with the member declaration
order, which can trigger -Werror build failures. Update the constructor so
soc_threshold, n_sparse_cones, and n_sparse_cone_entries are initialized in the
same order as their declarations, or reorder the member declarations to match;
use the constructor and the affected member fields in second_order_cone_kernels
to locate the fix.
In `@cpp/src/barrier/second_order_cone_reduction.cuh`:
- Around line 277-284: The cone reduction in second_order_cone_reduction.cuh is
adding init on every thread before block_reduce_t::Sum, which makes the final
result include init multiplied by block_dim instead of once. Update the
reduction flow in the block path so that only the accumulated acc[k] values are
reduced first, then apply init a single time after the block reduction in the
operator() / reduction helper that computes the cone sum. Keep the same logic
consistent with the warp path and verify the convenience overload still works
for f_t{0}.
---
Nitpick comments:
In `@cpp/src/barrier/barrier.cu`:
- Line 842: The comment in the sparse-cone branch is misleading because the code
path under cone_sparse_idx[k] processes the sparse cone row, not the dense SOC
path. Update the comment near the off-diagonal handling in barrier.cu to remove
the “dense SOC” wording and describe that these loops handle the remaining
intra/post-cone Q off-diagonals for the sparse cone, keeping the comment aligned
with the surrounding logic.
In `@cpp/tests/socp/second_order_cone_kernels.cu`:
- Around line 753-775: Update the sparse_cone_classification test in
second_order_cone_kernels to include a cone with dimension equal to
soc_threshold so the boundary case is covered alongside the existing 3, 6, 2,
and 32769 cones. Adjust the expected dense/sparse counts, expansion_var_count,
sparse entries, and sparse cone ids/dims assertions in cone_data_t to reflect
that dim == soc_threshold stays dense while dim > soc_threshold remains sparse.
This should lock the branch behavior in cone_data_t and prevent off-by-one
regressions in the dense-vs-sparse routing and index mapping.
- Around line 310-368: The new mixed-topology step-length test only checks that
compute_cone_step_length returns alpha values in range, but it does not verify
the updated iterate stays feasible. Extend
cone_step_length_many_small_one_sparse_medium_cone by reusing the per-cone
feasibility validation used in cone_step_length_keeps_iterate_in_cone after
applying the returned step to dx/dz, so the cone_data_t path for the 1000D cone
is checked for actual cone membership rather than just bounds on step length.
In `@cpp/tests/socp/solve_barrier_socp.cu`:
- Around line 869-875: The SOC threshold tests currently do not exercise
`settings.barrier_soc_threshold` end-to-end because the chosen `Q^6` and `Q^500`
cases remain sparse under the default behavior, so a bug that ignores the
setting could still pass. Update the `solve_barrier_socp` test coverage by
adding a boundary case in `test_barrier_socp` (or the relevant nearby test
helpers) that is solved twice with different `barrier_soc_threshold` values, or
use a `Q^5`/`Q^6` pair, so the dense-vs-sparse branching is actually validated
through the `simplex_solver_settings_t` plumbing.
In `@cpp/tests/socp/sparse_augmented_kkt_test.cu`:
- Around line 88-177: The sparse augmented KKT test only covers one sparse cone,
so it does not exercise packed-offset advancement or second-cone expansion-slot
mapping. Extend scatter_and_update_sparse_expansion to build at least two sparse
cones via cone_data_t, then validate that sparse_entry_offsets, sparse_cone_ids,
and the expansion updates in update_sparse_expansion_in_augmented and
scatter_sparse_hessian_diag_into_augmented are correct for both cones. Keep the
existing single-cone assertions, but add per-cone expectations that confirm the
second cone writes to the right augmented_x slots and does not reuse the first
cone’s offsets.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: e135c5c4-8f4c-4b4d-853d-46b8d7866a5b
📒 Files selected for processing (13)
cpp/include/cuopt/linear_programming/constants.hcpp/include/cuopt/linear_programming/pdlp/solver_settings.hppcpp/src/barrier/barrier.cucpp/src/barrier/iterative_refinement.hppcpp/src/barrier/second_order_cone_kernels.cuhcpp/src/barrier/second_order_cone_reduction.cuhcpp/src/dual_simplex/simplex_solver_settings.hppcpp/src/math_optimization/solver_settings.cucpp/src/pdlp/solve.cucpp/tests/socp/CMakeLists.txtcpp/tests/socp/second_order_cone_kernels.cucpp/tests/socp/solve_barrier_socp.cucpp/tests/socp/sparse_augmented_kkt_test.cu
💤 Files with no reviewable changes (1)
- cpp/src/barrier/iterative_refinement.hpp
| sparse_cone_ids(0, stream), | ||
| sparse_cone_dims(0, stream), | ||
| d(0, stream), | ||
| sparse_v(0, stream), | ||
| sparse_u(0, stream), | ||
| sparse_entry_offsets(0, stream), | ||
| cone_is_sparse(n_cones, stream), | ||
| scratch(n_cones, n_cone_entries, stream), | ||
| soc_threshold(soc_threshold_in), | ||
| n_sparse_cones(0), | ||
| n_sparse_cone_entries(0) |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Look for -Werror / -Wreorder handling in the C++/CUDA build config
rg -nP -- '-W(error|reorder|all)\b|CMAKE_(CXX|CUDA)_FLAGS|add_compile_options' \
--glob 'CMakeLists.txt' --glob '*.cmake' cpp 2>/dev/nullRepository: NVIDIA/cuopt
Length of output: 1235
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect the build flags more broadly and search for reorder-specific handling.
echo "=== cpp/CMakeLists.txt ==="
sed -n '60,210p' cpp/CMakeLists.txt
echo
echo "=== search for Wreorder ==="
rg -n --hidden -S 'Wreorder|reorder' cpp .github . -g '!*build*' -g '!*.pyc'Repository: NVIDIA/cuopt
Length of output: 13010
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect the build flags more broadly and search for reorder-specific handling.
echo "=== cpp/CMakeLists.txt ==="
sed -n '60,210p' cpp/CMakeLists.txt
echo
echo "=== search for Wreorder ==="
rg -n --hidden -S 'Wreorder|reorder' cpp .github . -g '!*build*' -g '!*.pyc'Repository: NVIDIA/cuopt
Length of output: 13010
Reorder the constructor initializer list to match the member declaration order. soc_threshold, n_sparse_cones, and n_sparse_cone_entries are declared before sparse_cone_ids/scratch, but initialized after them here. With -Werror enabled in the CMake flags, this can fail the build; move those initializers up or reorder the members.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cpp/src/barrier/second_order_cone_kernels.cuh` around lines 237 - 247, The
constructor initializer list in the second_order_cone_kernels class is out of
sync with the member declaration order, which can trigger -Werror build
failures. Update the constructor so soc_threshold, n_sparse_cones, and
n_sparse_cone_entries are initialized in the same order as their declarations,
or reorder the member declarations to match; use the constructor and the
affected member fields in second_order_cone_kernels to locate the fix.
| value_t sum = init; | ||
| #pragma unroll | ||
| for (int k = 0; k < items_per_thread; ++k) { | ||
| sum = sum + acc[k]; | ||
| } | ||
|
|
||
| sum = block_reduce_t(temp_storage).Sum(sum); | ||
| if (threadIdx.x == 0) { output[cone] = sum; } |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
init is accumulated block_dim times instead of once.
Every thread seeds sum = init, and block_reduce_t::Sum then sums across all block_dim threads, so the result becomes Σ entries + block_dim * init (256×) rather than Σ entries + init. The convenience overload at Line 157 currently passes f_t{0}, so this is latent today, but the public operator() at Line 114 accepts an arbitrary init, so any non-zero caller would get a wrong cone sum. (The warp kernel has the same shape but only multiplies by 32, so the two paths are also inconsistent.)
Apply init once, after the block reduction.
🐛 Proposed fix
- value_t sum = init;
+ value_t sum = value_t{0};
`#pragma` unroll
for (int k = 0; k < items_per_thread; ++k) {
sum = sum + acc[k];
}
sum = block_reduce_t(temp_storage).Sum(sum);
- if (threadIdx.x == 0) { output[cone] = sum; }
+ if (threadIdx.x == 0) { output[cone] = init + sum; }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| value_t sum = init; | |
| #pragma unroll | |
| for (int k = 0; k < items_per_thread; ++k) { | |
| sum = sum + acc[k]; | |
| } | |
| sum = block_reduce_t(temp_storage).Sum(sum); | |
| if (threadIdx.x == 0) { output[cone] = sum; } | |
| value_t sum = value_t{0}; | |
| `#pragma` unroll | |
| for (int k = 0; k < items_per_thread; ++k) { | |
| sum = sum + acc[k]; | |
| } | |
| sum = block_reduce_t(temp_storage).Sum(sum); | |
| if (threadIdx.x == 0) { output[cone] = init + sum; } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@cpp/src/barrier/second_order_cone_reduction.cuh` around lines 277 - 284, The
cone reduction in second_order_cone_reduction.cuh is adding init on every thread
before block_reduce_t::Sum, which makes the final result include init multiplied
by block_dim instead of once. Update the reduction flow in the block path so
that only the accumulated acc[k] values are reduced first, then apply init a
single time after the block reduction in the operator() / reduction helper that
computes the cone sum. Keep the same logic consistent with the warp path and
verify the convenience overload still works for f_t{0}.
Description
Adds sparsity exploitation for second-order cone programs (SOCP) that contain
large-dimensional cones, so the barrier solver no longer materializes a dense
Hessian block for every cone.
keep the existing explicit dense Hessian block; larger cones now use a sparse
rank-2 (Nesterov-Todd) representation in the augmented KKT system, with an
expansion block whose RHS rows are zero-filled.
(
Hs), and the sparse augmented matvec, plus assembly of the sparse expansioninto the augmented system.
cones, with an ILP path (multiple accumulators per thread) for large "medium" cones.
Issue
Checklist