-
-
Notifications
You must be signed in to change notification settings - Fork 33.8k
gh-140739: Fix crashes from corrupted remote memory #143190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The remote debugging module reads memory from another Python process which can be modified or freed at any time due to race conditions. When garbage data is read, various code paths could cause SIGSEGV crashes in the profiler process itself rather than gracefully rejecting the sample. Add bounds checking and validation for data read from remote memory: linetable parsing now checks buffer bounds, PyLong reading validates digit count, stack chunk sizes are bounded, set iteration limits table size, task pointer arithmetic checks for underflow, TLBC index is validated against array bounds, and thread list iteration detects cycles. All cases now reject the sample with an exception instead of crashing or looping forever.
Fidget-Spinner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Just a question: is remote debugging expected to work on FT? Because a lot of these operations don't look thread safe to me.
Misc/NEWS.d/next/Library/2025-12-26-14-51-50.gh-issue-140739.BAbZTo.rst
Outdated
Show resolved
Hide resolved
Yeah it is and we do have tests for that indeed. The same profiler object cannot be entered from different threads (it's locked) so there is no need to protect internal state. |
…AbZTo.rst Co-authored-by: Ken Jin <[email protected]>
| if (actual_size != current_size) { | ||
| // Validate size: reject garbage (too small or unreasonably large) | ||
| // Size must be at least enough for the header and reasonably bounded | ||
| if (actual_size <= offsetof(_PyStackChunk, data) || actual_size > MAX_STACK_CHUNK_SIZE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is looser than I'd like for _PyStackChunk, but whatever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish we had a tighter bound, but I'm not sure what would be appropiate for a real world stack frame? So that's why I said whatever :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish we had a tighter bound, but I'm not sure what would be appropiate for a real world stack frame?
But this is for the entire chunk no? Chunks will grow from 16Kb to whatever, we just need to ensure we don't copy too much because we just read a garbage size to copy. Chunks here are just an optimization so if we fail to read the chunks we will fallback to read frame-by-frame (which sucks but it works).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I didn't know you do the second part (fallback to read one by one). I think that's fine then.
The remote debugging module reads memory from another Python process
which can be modified or freed at any time due to race conditions.
When garbage data is read, various code paths could cause SIGSEGV
crashes in the profiler process itself rather than gracefully
rejecting the sample.
Add bounds checking and validation for data read from remote memory:
linetable parsing now checks buffer bounds, PyLong reading validates
digit count, stack chunk sizes are bounded, set iteration limits
table size, task pointer arithmetic checks for underflow, TLBC index
is validated against array bounds, and thread list iteration detects
cycles. All cases now reject the sample with an exception instead of
crashing or looping forever.