Reduce memory footprint of cuPDLPx#1153
Conversation
reduced memory footprint of duplicated vectors in op_problem_scaled passes all tests
…rom cupdlpx. passes all tests
…ution_ for cupdlpx
… true (default to false) passes all tests
passes all tests
|
/ok test 5e9d12b |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds cuPDLPx restart-awareness: redirects cuSPARSE CSR descriptors to original problem buffers, propagates an is_cupdlpx flag through solver and saddle-point initialization, and conditionally allocates/frees several device buffers and averaged-state structures for restart scenarios. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cpp/src/pdlp/saddle_point.cu (1)
20-53:⚠️ Potential issue | 🔴 CriticalAdd guard to prevent creating unsafe view of zero-sized primal gradient buffer in cuPDLPx mode.
In
cpp/src/pdlp/cusparse_view.cu, the constructor unconditionally creates a view withop_problem_scaled.n_variableselements from the primal gradient buffer (line 410–411). However, whenis_cupdlpxis true, the saddle point state allocatesprimal_gradient_with size 0. This creates an unsafe view that could access out-of-bounds memory.The
cudaMemsetAsynccall on the zero-sized buffer itself is safe (no-op), but the view creation is the actual hazard. Guard line 410–411 to skip view creation whenprimal_gradient_is empty, or add an assertion confirming that the buffer is non-empty before creating the view.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/pdlp/saddle_point.cu` around lines 20 - 53, The constructor saddle_point_state_t currently creates primal_gradient_ with size 0 when is_cupdlpx is true, but cusparse_view.cu later unconditionally builds a view of length op_problem_scaled.n_variables from that buffer; add a guard around the view creation in cusparse_view.cu (or assert buffer non-empty) so you only create the view when saddle_point_state_t::primal_gradient_.size() > 0 (i.e., when is_cupdlpx is false), preventing an unsafe zero-sized-buffer view; reference the symbols primal_gradient_, saddle_point_state_t, and the view creation that uses op_problem_scaled.n_variables to locate the change.
🧹 Nitpick comments (1)
cpp/src/mip_heuristics/problem/problem.cuh (1)
154-154: Keeppdlp_lighteninternal if possible.This reads like solver-internal cleanup rather than a stable
problem_tcapability. Exposing it publicly makes it easier for callers to put the object into a partially-invalid state.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/mip_heuristics/problem/problem.cuh` at line 154, The pdlp_lighten declaration in the public header exposes an internal solver cleanup operation (pdlp_lighten) on problem_t; remove this from the .cuh public header and instead make it internal by moving its declaration/definition into the corresponding implementation file (or place it in an anonymous namespace / mark static) so only internal code operating on problem_t can call it; update any internal callers to include the private implementation file and keep the symbol out of the public API.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cpp/src/pdlp/pdlp.cu`:
- Around line 2268-2278: The verbose printing accesses
problem_ptr->combined_bounds after problem_ptr->pdlp_lighten() frees it, causing
a use-after-free; move the PDLP_VERBOSE_MODE verbose block (the call to
print_problem_info<f_t> that uses problem_ptr->combined_bounds) to before
calling op_problem_scaled_.pdlp_lighten() and problem_ptr->pdlp_lighten(), or
alternatively postpone calling problem_ptr->pdlp_lighten() until after the
verbose print so that combined_bounds remains valid; update the code around the
calls to pdhg_solver_.get_cusparse_view().redirect_rows_and_cols(*problem_ptr),
op_problem_scaled_.pdlp_lighten(), and problem_ptr->pdlp_lighten() to ensure
print_problem_info<f_t> runs while combined_bounds is still allocated.
- Around line 124-127: Add fail-fast invariants to the warm-start validation so
copying into zero-sized average buffers is prevented: in the warm-start
validation block (after the existing validation around initial_*), add
cuopt_expects checks that when unscaled_primal_avg_solution_.size() == 0 then
initial_primal_average_.empty() (and similarly when
unscaled_dual_avg_solution_.size() == 0 then initial_dual_average_.empty()), and
fail otherwise; reference the buffers unscaled_primal_avg_solution_ and
unscaled_dual_avg_solution_ and the inputs initial_primal_average_ and
initial_dual_average_ so the logic short-circuits before any memcpy/copy occurs.
---
Outside diff comments:
In `@cpp/src/pdlp/saddle_point.cu`:
- Around line 20-53: The constructor saddle_point_state_t currently creates
primal_gradient_ with size 0 when is_cupdlpx is true, but cusparse_view.cu later
unconditionally builds a view of length op_problem_scaled.n_variables from that
buffer; add a guard around the view creation in cusparse_view.cu (or assert
buffer non-empty) so you only create the view when
saddle_point_state_t::primal_gradient_.size() > 0 (i.e., when is_cupdlpx is
false), preventing an unsafe zero-sized-buffer view; reference the symbols
primal_gradient_, saddle_point_state_t, and the view creation that uses
op_problem_scaled.n_variables to locate the change.
---
Nitpick comments:
In `@cpp/src/mip_heuristics/problem/problem.cuh`:
- Line 154: The pdlp_lighten declaration in the public header exposes an
internal solver cleanup operation (pdlp_lighten) on problem_t; remove this from
the .cuh public header and instead make it internal by moving its
declaration/definition into the corresponding implementation file (or place it
in an anonymous namespace / mark static) so only internal code operating on
problem_t can call it; update any internal callers to include the private
implementation file and keep the symbol out of the public API.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 6de370d2-d055-40a2-91b7-1c54fcbb34cc
📒 Files selected for processing (12)
cpp/src/mip_heuristics/problem/problem.cucpp/src/mip_heuristics/problem/problem.cuhcpp/src/pdlp/cusparse_view.cucpp/src/pdlp/cusparse_view.hppcpp/src/pdlp/initial_scaling_strategy/initial_scaling.cucpp/src/pdlp/pdhg.cucpp/src/pdlp/pdlp.cucpp/src/pdlp/pdlp.cuhcpp/src/pdlp/restart_strategy/pdlp_restart_strategy.cucpp/src/pdlp/saddle_point.cucpp/src/pdlp/saddle_point.hppcpp/src/pdlp/termination_strategy/infeasibility_information.cu
… for warm_start_data
…f pre_constraint_residual is false
There was a problem hiding this comment.
🧹 Nitpick comments (3)
cpp/src/pdlp/pdlp.cu (3)
210-211: Inconsistent sizing logic between buffers and termination strategy.Lines 132-139 allocate full-size average buffers when warm-start is populated in cuPDLPx mode, but the termination strategy here unconditionally uses size 0 for cuPDLPx. This mismatch relies on an implicit invariant that cuPDLPx mode always has
never_restart_to_average=true.Consider adding explicit validation in the constructor to enforce this invariant and prevent subtle bugs:
if (is_cupdlpx_) { cuopt_expects(settings_.hyper_params.never_restart_to_average, error_type_t::ValidationError, "cuPDLPx mode requires never_restart_to_average=true"); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/pdlp/pdlp.cu` around lines 210 - 211, The allocation vs termination mismatch for cuPDLPx is unsafe: ensure the constructor validates that when is_cupdlpx_ is true the invariant never_restart_to_average is enforced; in the class/constructor that initializes is_cupdlpx_ add a guard checking settings_.hyper_params.never_restart_to_average and raise a ValidationError (use cuopt_expects or existing validation error pattern) with a clear message like "cuPDLPx mode requires never_restart_to_average=true" so buffers allocated for warm-start match the termination logic using the zero sizes.
2473-2476: Redundant cleanup calls—already executed unconditionally earlier.Lines 2289-2290 already call
lighten_problem_for_pdlpon bothop_problem_scaled_and*problem_ptrunconditionally. This conditional block performs the same resize-to-0 operations, which are no-ops since the vectors are already empty.♻️ Suggested removal
- if (!settings_.per_constraint_residual) { - lighten_problem_for_pdlp(op_problem_scaled_, stream_view_); - lighten_problem_for_pdlp(*problem_ptr, stream_view_); - }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/pdlp/pdlp.cu` around lines 2473 - 2476, The conditional block redundantly calls lighten_problem_for_pdlp on op_problem_scaled_ and *problem_ptr when settings_.per_constraint_residual is false, but these same calls were already made unconditionally earlier; remove the conditional block containing lighten_problem_for_pdlp(op_problem_scaled_, stream_view_) and lighten_problem_for_pdlp(*problem_ptr, stream_view_) to avoid duplicate no-op cleanup and keep only the earlier unconditional calls.
2280-2290: Memory optimization looks correct, but modifies the original problem.The redirect-before-free sequence is correct. However,
lighten_problem_for_pdlp(*problem_ptr, ...)at line 2290 modifies the caller's problem object by freeingvariable_types,related_variables_offsets, andinteger_fixed_variable_map.If callers expect the original problem to remain unmodified after solving (e.g., for re-solving or inspection), this could be surprising. Consider documenting this behavior or making it opt-in.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/pdlp/pdlp.cu` around lines 2280 - 2290, The current call lightening the device vectors (lighten_problem_for_pdlp) is mutating the caller's Problem via *problem_ptr after redirect_rows_and_cols; to avoid surprising callers, change to operate on a local copy or make the destructive behavior opt-in: either (A) replace lighten_problem_for_pdlp(*problem_ptr, stream_view_) with a call on a shallow/deep copy (e.g., create Problem temp = *problem_ptr; lighten_problem_for_pdlp(temp, stream_view_);) so the original stays intact, or (B) add a flag parameter to lighten_problem_for_pdlp (e.g., bool preserve_original) and invoke it with preserve_original=true so it copies internally before freeing variable_types/related_variables_offsets/integer_fixed_variable_map; update call sites (pd hg_solver_.get_cusparse_view().redirect_rows_and_cols, op_problem_scaled_ use remains) and document the new behavior in the function comment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@cpp/src/pdlp/pdlp.cu`:
- Around line 210-211: The allocation vs termination mismatch for cuPDLPx is
unsafe: ensure the constructor validates that when is_cupdlpx_ is true the
invariant never_restart_to_average is enforced; in the class/constructor that
initializes is_cupdlpx_ add a guard checking
settings_.hyper_params.never_restart_to_average and raise a ValidationError (use
cuopt_expects or existing validation error pattern) with a clear message like
"cuPDLPx mode requires never_restart_to_average=true" so buffers allocated for
warm-start match the termination logic using the zero sizes.
- Around line 2473-2476: The conditional block redundantly calls
lighten_problem_for_pdlp on op_problem_scaled_ and *problem_ptr when
settings_.per_constraint_residual is false, but these same calls were already
made unconditionally earlier; remove the conditional block containing
lighten_problem_for_pdlp(op_problem_scaled_, stream_view_) and
lighten_problem_for_pdlp(*problem_ptr, stream_view_) to avoid duplicate no-op
cleanup and keep only the earlier unconditional calls.
- Around line 2280-2290: The current call lightening the device vectors
(lighten_problem_for_pdlp) is mutating the caller's Problem via *problem_ptr
after redirect_rows_and_cols; to avoid surprising callers, change to operate on
a local copy or make the destructive behavior opt-in: either (A) replace
lighten_problem_for_pdlp(*problem_ptr, stream_view_) with a call on a
shallow/deep copy (e.g., create Problem temp = *problem_ptr;
lighten_problem_for_pdlp(temp, stream_view_);) so the original stays intact, or
(B) add a flag parameter to lighten_problem_for_pdlp (e.g., bool
preserve_original) and invoke it with preserve_original=true so it copies
internally before freeing
variable_types/related_variables_offsets/integer_fixed_variable_map; update call
sites (pd hg_solver_.get_cusparse_view().redirect_rows_and_cols,
op_problem_scaled_ use remains) and document the new behavior in the function
comment.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 67ac28f7-ae73-4132-b0ff-2fa984c912c9
📒 Files selected for processing (2)
cpp/src/pdlp/cusparse_view.cucpp/src/pdlp/pdlp.cu
🚧 Files skipped from review as they are similar to previous changes (1)
- cpp/src/pdlp/cusparse_view.cu
mlubin
left a comment
There was a problem hiding this comment.
Nice! Do you have any concrete numbers in the reduction of peak memory usage?
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cpp/src/pdlp/pdlp.cu (1)
205-214:⚠️ Potential issue | 🔴 CriticalZero-sized
average_termination_strategy_is still consumed later.
check_termination()only skipsevaluate_termination_criteria()whennever_restart_to_averageis set. It still unconditionally readsget_termination_status(0)at Line 1028, checksall_optimal_status()at Lines 1096 and 1151, and usesget_convergence_information()/fill_return_problem_solution()throughout the average branches below. Constructing the average strategy with{0, 0}therefore removes the storage without removing the uses.Suggested fix
average_termination_strategy_{handle_ptr_, op_problem, op_problem_scaled_, average_op_problem_evaluation_cusparse_view_, pdhg_solver_.get_cusparse_view(), - settings_.hyper_params.never_restart_to_average ? 0 : primal_size_h_, - settings_.hyper_params.never_restart_to_average ? 0 : dual_size_h_, + primal_size_h_, + dual_size_h_, initial_scaling_strategy_, settings_, climber_strategies_},As per coding guidelines, "Flag algorithm correctness errors including logic errors in optimization algorithms" and "Flag invalid memory access patterns including out-of-bounds, use-after-free, and host/device confusion."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/pdlp/pdlp.cu` around lines 205 - 214, The average termination strategy is being constructed with zero sizes when settings_.hyper_params.never_restart_to_average is true, but later code unconditionally accesses it (check_termination(), get_termination_status(0), all_optimal_status(), get_convergence_information(), fill_return_problem_solution), causing logic/memory errors; fix by not constructing average_termination_strategy_ when never_restart_to_average is true and protect all usages of the average strategy by checking settings_.hyper_params.never_restart_to_average (or that primal_size_h_/dual_size_h_ > 0) before calling check_termination(), get_termination_status(), all_optimal_status(), get_convergence_information(), or fill_return_problem_solution, or alternatively make those accessors tolerate an empty/absent average strategy—apply the same conditional guard around every "average" branch that currently assumes the strategy exists.
🧹 Nitpick comments (1)
cpp/src/pdlp/pdlp.cu (1)
2288-2290: Make the PDLP cleanup condition consistent.These helper calls already free the PDLP-only vectors unconditionally, so the later
if (!settings_.per_constraint_residual)block at Lines 2473-2476 no longer changes behavior. Ifper_constraint_residual=trueis supposed to preserve those buffers, the cleanup is happening too early; otherwise the later branch is just dead code.Suggested refactor
- // Remove unused device vectors in PDLP - lighten_problem_for_pdlp(op_problem_scaled_, stream_view_); - lighten_problem_for_pdlp(*problem_ptr, stream_view_); + if (!settings_.per_constraint_residual) { + // Remove unused device vectors in PDLP + lighten_problem_for_pdlp(op_problem_scaled_, stream_view_); + lighten_problem_for_pdlp(*problem_ptr, stream_view_); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/pdlp/pdlp.cu` around lines 2288 - 2290, The PDLP-only buffers are being freed unconditionally by the calls to lighten_problem_for_pdlp(op_problem_scaled_, stream_view_) and lighten_problem_for_pdlp(*problem_ptr, stream_view_), making the later branch guarded by settings_.per_constraint_residual redundant or incorrect; update the cleanup so it is consistent: either move those two lighten_problem_for_pdlp calls inside the same if (!settings_.per_constraint_residual) condition so buffers are preserved when per_constraint_residual==true, or remove the later dead branch entirely if unconditional freeing is intended—adjust whichever code path you choose and ensure references to op_problem_scaled_, problem_ptr, lighten_problem_for_pdlp, and settings_.per_constraint_residual are updated accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cpp/src/pdlp/restart_strategy/pdlp_restart_strategy.cu`:
- Around line 94-101: The containers avg_duality_gap_ and current_duality_gap_
are being constructed with size 0 when hyper_params.never_restart_to_average is
true, but update_distance() and the trust-region restart logic still write/read
them; fix by always allocating these containers with the full primal_size and
dual_size (i.e., remove the conditional zero-sizing) so avg_duality_gap_ and
current_duality_gap_ remain valid, or alternatively guard all writes/reads to
these containers behind the hyper_params.never_restart_to_average check; update
the constructor calls that currently pass "hyper_params.never_restart_to_average
? 0 : primal_size" / "dual_size" so they allocate the real sizes, and ensure
update_distance() and the trust-region restart computation paths use those
allocated buffers safely.
---
Outside diff comments:
In `@cpp/src/pdlp/pdlp.cu`:
- Around line 205-214: The average termination strategy is being constructed
with zero sizes when settings_.hyper_params.never_restart_to_average is true,
but later code unconditionally accesses it (check_termination(),
get_termination_status(0), all_optimal_status(), get_convergence_information(),
fill_return_problem_solution), causing logic/memory errors; fix by not
constructing average_termination_strategy_ when never_restart_to_average is true
and protect all usages of the average strategy by checking
settings_.hyper_params.never_restart_to_average (or that
primal_size_h_/dual_size_h_ > 0) before calling check_termination(),
get_termination_status(), all_optimal_status(), get_convergence_information(),
or fill_return_problem_solution, or alternatively make those accessors tolerate
an empty/absent average strategy—apply the same conditional guard around every
"average" branch that currently assumes the strategy exists.
---
Nitpick comments:
In `@cpp/src/pdlp/pdlp.cu`:
- Around line 2288-2290: The PDLP-only buffers are being freed unconditionally
by the calls to lighten_problem_for_pdlp(op_problem_scaled_, stream_view_) and
lighten_problem_for_pdlp(*problem_ptr, stream_view_), making the later branch
guarded by settings_.per_constraint_residual redundant or incorrect; update the
cleanup so it is consistent: either move those two lighten_problem_for_pdlp
calls inside the same if (!settings_.per_constraint_residual) condition so
buffers are preserved when per_constraint_residual==true, or remove the later
dead branch entirely if unconditional freeing is intended—adjust whichever code
path you choose and ensure references to op_problem_scaled_, problem_ptr,
lighten_problem_for_pdlp, and settings_.per_constraint_residual are updated
accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 2c589a23-685a-46f9-9498-ad9162723f72
📒 Files selected for processing (2)
cpp/src/pdlp/pdlp.cucpp/src/pdlp/restart_strategy/pdlp_restart_strategy.cu
|
/ok test 195ca23 |
|
|
||
| // Redirects the vectors of rows and columns from op_problem_scaled_ to the original problem so | ||
| // that they can be freed | ||
| void redirect_rows_and_cols(const problem_t<i_t, f_t>& original_problem); |
There was a problem hiding this comment.
Why not just move ?
This is very specific to cusparse pointers right? I think the naming should reflect that.
There was a problem hiding this comment.
We still need original_problem to own these pointers. Here we are only borrowing it from scaled problem so we can free duplicated data, this is why I don't think we should use move here.
I will rename it to reflect its link with cusarse ptr
…uctor, to instead use hyper parameters
|
/ok to test 851be7c |
|
/ok to test 6981c1f |
|
/ok to test 8f9a937 |
|
/ok to test 29bd4ca |
1 similar comment
|
/ok to test 29bd4ca |
|
/ok to test 72afe0a |
This PR removes most of the unused/one-time-used device_vectors in cuPDLPx to reduce the memory footprint. It allows to run bigger problems without running out of memory