Local Aware IBM#1378
Conversation
… are only locally aware
|
Claude Code Review Head SHA: 9879df3 Files changed:
Findings: Critical: Stack overflow in
|
|
I generated and pushed the missing golden file for While reviewing the diff I found several other issues worth addressing: Critical (wrong results)
High (MPI / GPU correctness)
Post-process SILO writes called from all ranks ( Medium (robustness / performance)
Low (code quality)
The two highest-priority fixes are the |
|
Addressing AI comments: All of the critical issues are incorrect for one reason or another. I have gone through and verified that the relevant chunks of code are in place to address the concerns it has. As for the neighbor boundaries, they are never called in a GPU region, so this comment is mute. I have removed the GPU routine call to help with compilation in response. The file per process F claim is just false, and I tested that the code runs find without it. But it is true that we do not need other ranks to be doing any of this calculation regardless. I extended the rank 0 check in response. nreqs is in fact reset between loop iterations. Snapshot arrays are in fact reset between loop iterations It is confused about the need to GPU allocate arrays for patch_ib. Because the array is used with a declare, but never actually allocated on the GPU, we just have the GPU pointer. When we perform the first copy to the GPU to allocate the array size, we have already done the reshape. This means that the GPU array is properly sized. This is important because the size of the patch_ib array is significant in the new code and could causes issues with limiting problem size because the GPU must be able to hold the global patch_ib array, even though it would then immediately be deallocated and reallocated to a smaller array. If the number of ib patches grows to the extent that it costs many for GB of memory, this could cause memory limitations. The current implementation should be the correct way to go. the moving IBM flag should not be recomputed. You want it to be the same on all ranks. There is argument that it should be moved to the case level for case optimizaiton later, but that is a future PR. Force arrays getting moved is going to be handled in a future PR that already has a github issue I created yesterday. Prohibit for number of local IBs resolved. In summary, two of these I made minor adjustments for. One I reject for reasons that I believe are valid and better for long-term code support. All others are incorrect. |
|
General Note: Mac errors are on sections of the code that were not touched by this PR (bubbles), and it is a random dyl linker error. I get the same error on reldebug on a chemsitry case. I get the same error locally, but on a different set of tests. This seems like a MacOS instability issue, and not related to the particular PR. Going to attempt to rerun. If ti fails again, but on different tests, then it looks like the MacOS test suite checks may be broken. Edit: Rerunning the job changed the jobs which failed. This appears to be a spurious failure for MacOS specifically. It appears to be specific to the branch, as other branch are not experiencing it. Will investigate. |
9fe7e7b to
6faf896
Compare
…on and cleanliness
Description
The current code has an issue scaling much past 10k particles due to limitations in the MPIAllReduceSum in the IB force computation. This PR attempts to alleviate this by limiting the number of IBs any given rank can be aware of to its neighbors. This turns the AllReduce compute to a MPI neighbor computation, removing the communication bottlneck. To support this, a massive overhaul of IB ownership between ranks was required.
Type of change
Testing
TBD
Checklist
AI code reviews
Reviews are not triggered automatically. To request a review, comment on the PR:
@coderabbitai review— incremental review (new changes only)@coderabbitai full review— full review from scratch/review— Qodo review/improve— Qodo code suggestions@claude full review— Claude full review (also triggers on PR open/reopen/ready)claude-full-review— Claude full review via label