How do you use Sentry?
Sentry SaaS (sentry.io)
Version
2.58.0 (also reproduced on 2.17.0)
Steps to Reproduce
- Run a FastAPI app under
uvicorn on Python 3.11, containerised (Linux, x86_64), with many concurrent HTTP requests.
- Each request performs synchronous outbound I/O (e.g.
google-cloud-storage blob.download_as_bytes, arbitrary requests.Session.post / get) from an anyio worker thread (standard FastAPI run_in_threadpool pattern).
sentry_sdk.init(dsn=..., traces_sampler=..., profiles_sample_rate=1.0) — any non-zero profiles_sample_rate is sufficient to trigger.
- Wait ~seconds under steady load.
Expected Result
No process-level crash from the profiler.
Actual Result
Worker process dies with SIGSEGV. Fault address is a small integer (observed: 0, 1, 2, 7, 8, 68, 72, 80, 111, 228) — classic use-after-free / null-deref-with-offset. Multiple different threads crash across different incidents; every crash has the profiler thread actively sampling frames at the moment of death.
With PYTHONFAULTHANDLER=1 enabled, faulthandler consistently shows the pattern:
Sibling thread — profiler (running at the moment of crash, every time):
File ".../sentry_sdk/profiler/transaction_profiler.py", line 711, in run
File ".../sentry_sdk/profiler/transaction_profiler.py", line 601, in _sample_stack
File ".../sentry_sdk/profiler/transaction_profiler.py", line 602, in <listcomp>
File ".../sentry_sdk/profiler/utils.py", line 167, in extract_stack
File ".../sentry_sdk/profiler/utils.py", line 167, in <genexpr>
File ".../sentry_sdk/profiler/utils.py", line 114, in frame_id
Current thread — mid-HTTP I/O via a Sentry stdlib patch (one representative stack; the exact app code above the stdlib patch varies, but the Sentry patch frame is always present):
File "/usr/local/lib/python3.11/ssl.py", line 1166, in read
File "/usr/local/lib/python3.11/ssl.py", line 1314, in recv_into
File "/usr/local/lib/python3.11/socket.py", line 718, in readinto
File "/usr/local/lib/python3.11/http/client.py", line 291, in _read_status
File "/usr/local/lib/python3.11/http/client.py", line 330, in begin
File "/usr/local/lib/python3.11/http/client.py", line 1415, in getresponse
File ".../sentry_sdk/integrations/stdlib.py", line 146, in getresponse <-- Sentry patch
File ".../urllib3/connection.py", line 571, in getresponse
File ".../urllib3/connectionpool.py", line 534, in _make_request
File ".../urllib3/connectionpool.py", line 787, in urlopen
File ".../requests/adapters.py", line 644, in send
File ".../opentelemetry_instrumentation_requests/__init__.py", line 432, in instrumented_send
File ".../requests/sessions.py", line 703, in send
File ".../requests/sessions.py", line 589, in request
File ".../google/auth/transport/requests.py", line 543, in request
File ".../google/cloud/storage/_media/requests/download.py", line 253, in retriable_request
File ".../google/api_core/retry/retry_unary.py", line 147, in retry_target
File ".../google/cloud/storage/blob.py", line 1094, in _do_download
File ".../google/cloud/storage/blob.py", line 1530, in download_as_bytes
File ".../google/cloud/storage/blob.py", line 1651, in download_as_string
File ".../<app>/handler.py", in <app_handler>
File ".../starlette/concurrency.py", line 42, in run_in_threadpool
File ".../anyio/to_thread.py", line 63, in run_sync
File ".../anyio/_backends/_asyncio.py", line 1002, in run
File ".../sentry_sdk/integrations/fastapi.py", line 90, in _sentry_call
File ".../sentry_sdk/integrations/threading.py", line 133, in _run_old_run_func
File ".../sentry_sdk/integrations/threading.py", line 140, in run
File "/usr/local/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
File "/usr/local/lib/python3.11/threading.py", line 1002, in _bootstrap
Previously observed variant: same profiler sibling stack, but the crashing frame was in tracing.Span.__init__ → uuid.uuid4(), triggered via the sentry_sdk/integrations/stdlib.py:91 putrequest patch instead of getresponse. Both stdlib patch points reproduce.
Analysis
The profiler thread in transaction_profiler.py samples PyFrameObjects of all other threads at the configured frequency, via extract_stack → frame_id (utils.py:114). frame_id reads fields from a PyFrameObject that another thread may be actively mutating/freeing. There is no GIL-level synchronisation across the sample boundary — the sampler is scheduled cooperatively with mutator threads, but the C-level attribute reads inside frame_id can see a partially-freed object if the mutator releases/reclaims a frame or code object mid-sample.
The signature (always tiny fault_addr, always in frame_id/extract_stack while a mutator is mid-Span construction around HTTP I/O) is consistent with that race. The Sentry stdlib.putrequest / getresponse patches are a frequent entry point because every outbound HTTP call constructs a new Span → allocates a uuid4() → lots of short-lived frame/code churn right in the profiler's sampling window.
Workarounds
profiles_sample_rate=0.0 — stops the profiler thread entirely. Confirmed to eliminate crashes.
profiles_sample_rate=0.1 (reduced from 1.0) — reduces crash rate proportionally but does not eliminate it. A single sampled transaction hitting the race is enough to kill the container.
- Upgrading
sentry-sdk from 2.17.0 to 2.58.0 does not fix it. The legacy thread-sampling profiler is still used whenever profiles_sample_rate > 0; 2.24.1+ added the continuous profiler as an opt-in, it did not replace the transaction profiler.
Ask
- Can the Sentry team confirm whether the thread-sampling transaction profiler is considered safe for production use on Python 3.11+ under concurrent HTTP load?
- If not, would the docs acknowledge this (it's currently presented as a general-purpose option)?
- Could
frame_id / extract_stack be hardened against concurrent mutation, or is migration to the continuous profiler the official path forward?
Environment
- OS: Linux (Debian slim base, x86_64), managed container platform
- Python: 3.11 (CPython, stock)
- Runtime: FastAPI 0.114, Starlette 0.37, uvicorn 0.34, anyio 4.13
- Third-party in-stack at crash:
requests 2.32, urllib3 2.6, google-cloud-storage 3.10, google-auth 2.49, opentelemetry-instrumentation-requests 0.62b0
PYTHONFAULTHANDLER=1 to capture the trace; without it, crash is logged only as "Container terminated on signal 11"
How do you use Sentry?
Sentry SaaS (sentry.io)
Version
2.58.0 (also reproduced on 2.17.0)
Steps to Reproduce
uvicornon Python 3.11, containerised (Linux, x86_64), with many concurrent HTTP requests.google-cloud-storageblob.download_as_bytes, arbitraryrequests.Session.post/get) from ananyioworker thread (standard FastAPIrun_in_threadpoolpattern).sentry_sdk.init(dsn=..., traces_sampler=..., profiles_sample_rate=1.0)— any non-zeroprofiles_sample_rateis sufficient to trigger.Expected Result
No process-level crash from the profiler.
Actual Result
Worker process dies with
SIGSEGV. Fault address is a small integer (observed: 0, 1, 2, 7, 8, 68, 72, 80, 111, 228) — classic use-after-free / null-deref-with-offset. Multiple different threads crash across different incidents; every crash has the profiler thread actively sampling frames at the moment of death.With
PYTHONFAULTHANDLER=1enabled, faulthandler consistently shows the pattern:Sibling thread — profiler (running at the moment of crash, every time):
Current thread — mid-HTTP I/O via a Sentry stdlib patch (one representative stack; the exact app code above the stdlib patch varies, but the Sentry patch frame is always present):
Previously observed variant: same profiler sibling stack, but the crashing frame was in
tracing.Span.__init__ → uuid.uuid4(), triggered via thesentry_sdk/integrations/stdlib.py:91putrequestpatch instead ofgetresponse. Both stdlib patch points reproduce.Analysis
The profiler thread in
transaction_profiler.pysamplesPyFrameObjects of all other threads at the configured frequency, viaextract_stack→frame_id(utils.py:114).frame_idreads fields from aPyFrameObjectthat another thread may be actively mutating/freeing. There is no GIL-level synchronisation across the sample boundary — the sampler is scheduled cooperatively with mutator threads, but the C-level attribute reads insideframe_idcan see a partially-freed object if the mutator releases/reclaims a frame or code object mid-sample.The signature (always tiny
fault_addr, always inframe_id/extract_stackwhile a mutator is mid-Span construction around HTTP I/O) is consistent with that race. The Sentrystdlib.putrequest/getresponsepatches are a frequent entry point because every outbound HTTP call constructs a newSpan→ allocates auuid4()→ lots of short-lived frame/code churn right in the profiler's sampling window.Workarounds
profiles_sample_rate=0.0— stops the profiler thread entirely. Confirmed to eliminate crashes.profiles_sample_rate=0.1(reduced from1.0) — reduces crash rate proportionally but does not eliminate it. A single sampled transaction hitting the race is enough to kill the container.sentry-sdkfrom 2.17.0 to 2.58.0 does not fix it. The legacy thread-sampling profiler is still used wheneverprofiles_sample_rate > 0; 2.24.1+ added the continuous profiler as an opt-in, it did not replace the transaction profiler.Ask
frame_id/extract_stackbe hardened against concurrent mutation, or is migration to the continuous profiler the official path forward?Environment
requests2.32,urllib32.6,google-cloud-storage3.10,google-auth2.49,opentelemetry-instrumentation-requests0.62b0PYTHONFAULTHANDLER=1to capture the trace; without it, crash is logged only as "Container terminated on signal 11"