Fix use after free in neighbors, spring_attachments and cell attack #397
Fix use after free in neighbors, spring_attachments and cell attack #397vincent-noel wants to merge 8 commits intoMathCancer:developmentfrom
Conversation
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for (int j=0; j < all_cells->size(); j++) | ||
| { | ||
| Cell* pC = (*all_cells)[j]; | ||
| if (( pC != this) && pC->phenotype.cell_interactions.pAttackTarget == this) { | ||
| pC->phenotype.cell_interactions.pAttackTarget = NULL; | ||
| } | ||
| } |
There was a problem hiding this comment.
The loop iterates over all_cells without synchronization, and the #pragma omp parallel for creates concurrent writes to pAttackTarget. This can cause race conditions if cells are being deleted or modified concurrently. Consider removing the parallel directive or adding proper synchronization.
| void Cell::remove_self_from_attackers( void ) | ||
| { | ||
| #pragma omp parallel for | ||
| for (int j=0; j < all_cells->size(); j++) | ||
| { | ||
| Cell* pC = (*all_cells)[j]; | ||
| if (( pC != this) && pC->phenotype.cell_interactions.pAttackTarget == this) { | ||
| pC->phenotype.cell_interactions.pAttackTarget = NULL; | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
This O(n) loop iterates through all cells to clean up attack targets, which can be expensive for large simulations with many cells. Consider using a reverse index (e.g., a set/list of attackers stored on each potential target cell) to make this operation O(k) where k is the number of attacking cells, which is typically much smaller than n.
| for ( int i = 0; i < all_cells->size(); i++ ) | ||
| { | ||
| Cell* pC = (*all_cells)[i]; | ||
| if (pC != this) | ||
| { | ||
| auto SearchResult = std::find( | ||
| pC->state.neighbors.begin(),pC->state.neighbors.end(),this ); | ||
| if ( SearchResult != pC->state.neighbors.end() ) | ||
| { | ||
| std::cout << "Cell " << pC->ID << " still has cell " << this->ID << " as a neighbor!" << std::endl; | ||
| pC->state.neighbors.erase( SearchResult ); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
This loop iterates over all_cells without thread safety protections. If cell deletion can occur concurrently (e.g., from multiple threads), this could lead to iterator invalidation or accessing deallocated memory. Consider adding appropriate synchronization or documenting that delete_cell must be called from a single-threaded context.
| for ( int i = 0; i < all_cells->size(); i++ ) | ||
| { | ||
| Cell* pC = (*all_cells)[i]; | ||
| if (pC != this) | ||
| { | ||
| auto SearchResult = std::find( | ||
| pC->state.spring_attachments.begin(),pC->state.spring_attachments.end(),this ); | ||
| if ( SearchResult != pC->state.spring_attachments.end() ) | ||
| { | ||
| std::cout << "Cell " << pC->ID << " still has cell " << this->ID << " as a spring attachment!" << std::endl; | ||
| pC->state.spring_attachments.erase( SearchResult ); | ||
| } | ||
|
|
||
| } | ||
| } |
There was a problem hiding this comment.
This loop iterates over all_cells without thread safety protections. If cell deletion can occur concurrently (e.g., from multiple threads), this could lead to iterator invalidation or accessing deallocated memory. Consider adding appropriate synchronization or documenting that delete_cell must be called from a single-threaded context.
|
I remember being quite opposed to this PR, but I may have misunderstood what it was doing? I had thought we were looping (or even double looping?) over all cells every mechanics iteration or something like that. I see now that this is instead at every deletion, which is far more rare. So, while this is still a bandaid (we should just have all these lists be accurate all the time), this is not nearly the performance hit I recall imagining. If we don't have the collective time to track down the source of this bug, this patch should be prioritized for merging |
|
I diagrammed out how Let
It's that thread 1 seemed to get blocked so long as T2 does its attachments that seems likely to make this vanishingly rare. I think this also reveals the solution to this part of the bug giving rise to this PR: wrap the joint calls to void attach_cells_as_spring( Cell* pCell_1, Cell* pCell_2 )
{
#pragma omp critical {
pCell_1->attach_cell_as_spring(pCell_2);
pCell_2->attach_cell_as_spring(pCell_1);
}
return;
}This way, once Thread 1 decides to detach Haven't looked into neighbors to see if a similar dynamic is possible there. |
|
I've just witnessed this1 happen in very large numbers. Both on real data passed on to me from a sim and in just using the template project with 5000 cells at start. In the latter case, I found four examples in the neighbor graph at time t=60.0 min. For example, cell 36 has cell 208 in its list of neighbors, but 208 does not have 36. These are their lines: ...
36: 1366,643,208,3173,1021
...
208: 3846,3173,3108,4701,1021
...output00000001_cell_neighbor_graph.txt Footnotes
|
|
this does little to explain why at t=60 there remains inconsistencies, since this should be resolved as every mechanics voxel ends up with a cell and so sets its max interaction distance accordingly. more work required there. |
I had some crash that I wanted to fix for a while, and finally found the issues:
Here the fixes I'm proposing for now:
Clearly, this can be improved, but it fixes the present issues.