Skip to content

fix(litellm): store bookkeeping span off-band, not in forwarded metadata#6598

Open
jgreer013 wants to merge 1 commit into
getsentry:masterfrom
jgreer013:jgreer013/issue-6596
Open

fix(litellm): store bookkeeping span off-band, not in forwarded metadata#6598
jgreer013 wants to merge 1 commit into
getsentry:masterfrom
jgreer013:jgreer013/issue-6596

Conversation

@jgreer013

Copy link
Copy Markdown

Summary

Fixes #6596 (Linear PY-2540).

With LiteLLMIntegration enabled, any call that passes caller metadata could crash during request serialization:

TypeError: Object of type Span is not JSON serializable

Root cause

_input_callback stored the live span via _get_metadata_dict(kwargs)["_sentry_span"] = span. litellm threads the caller's metadata dict through to litellm_params["metadata"] — it's the same dict object the caller passed — and some providers (e.g. Anthropic's /v1/messages passthrough) forward that dict into the outbound request body. So the live Span landed at request_body["metadata"]["_sentry_span"] and json.dumps(request_body) failed before the request was sent.

Separately, under send_default_pii=True + include_prompts=True, that span's gen_ai.request.messages holds the verbatim prompt, so any sink that serialized the injected metadata could leak prompt content to the provider.

Fix

Store the bookkeeping span off-band in a module-level registry keyed by litellm_call_id (a per-request UUID that is stable across the input/success/failure callbacks), falling back to the identity of the shared callback kwargs dict for direct callback invocations that omit it. The span no longer lives in any litellm-visible dict, so it can't be forwarded, serialized, or deep-copied by litellm — fixing both the crash and the prompt-leak vector. The registry entry is removed by the terminal success/failure callback (streaming success peeks and pops only on the final call).

Testing

  • New regression test test_caller_metadata_stays_json_serializable asserts the forwarded metadata stays free of _sentry_span and JSON-serializable at the serialization moment, while the span is still recorded off-band. It fails on master and passes with this change.
  • Full litellm suite: 160 passed (litellm v1.81.16), including streaming/async/failure paths.
  • Regression test also green on the oldest (v1.77.7) and latest litellm matrix entries.
  • mypy clean; ruff check + format clean.

Related (downstream, defense-in-depth)

litellm forwards/serializes this injected metadata into the provider body. Companion issue: BerriAI/litellm#30662. litellm has stripped such span objects in other logging paths (BerriAI/litellm#15728, BerriAI/litellm#12354).

@jgreer013 jgreer013 force-pushed the jgreer013/issue-6596 branch from 6fb4fa5 to 5a5f39b Compare June 18, 2026 04:29
@jgreer013 jgreer013 marked this pull request as ready for review June 18, 2026 04:36
@jgreer013 jgreer013 requested a review from a team as a code owner June 18, 2026 04:36

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 5a5f39b. Configure here.

Comment thread sentry_sdk/integrations/litellm.py Outdated
@jgreer013 jgreer013 force-pushed the jgreer013/issue-6596 branch from 5a5f39b to 986f979 Compare June 18, 2026 16:28
Comment thread sentry_sdk/integrations/litellm.py
Comment thread sentry_sdk/integrations/litellm.py Outdated
# it. The terminal success/failure callback removes the entry; the registry is
# capped (oldest evicted first) so calls abandoned before a terminal callback
# fires -- e.g. a stream the caller stops iterating -- cannot grow it unbounded.
_MAX_TRACKED_SPANS = 1000

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!
I'll think about approaches because the _spans_by_call will grow if spans aren't finished correctly and everything is dropped if _MAX_TRACKED_SPANS is reached.

Generally, if there's a public shared reference between input and output callbacks, we should use that. If not, your approach could be the best we can do.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like there isn't one in this case, but perhaps there's an opportunity to add one with a bigger rework?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, turns out there is one — litellm threads the same model_call_details dict through all three callbacks, so I moved the span onto that (off the forwarded metadata path) instead of the module-level registry. No cap or eviction now, and the span's lifetime is tied to the request, like the clickhouse/dramatiq integrations do it. I double-checked the Anthropic passthrough doesn't forward it — the request body only gets the recognized params, not model_call_details. Lmk if you had a different reference in mind!

With LiteLLMIntegration enabled, any call passing caller `metadata` crashed
during request serialization. `_input_callback` stored the live Span in the
caller's `metadata` dict, and some providers (e.g. Anthropic's /v1/messages
passthrough) forward that dict into the outbound request body, so
`json.dumps(request_body)` raised `TypeError: Object of type Span is not JSON
serializable` before the request was sent. The span (holding the verbatim
prompt under send_default_pii) could also leak to the provider.

Stash the span on a top-level key of the per-request kwargs dict
(litellm's `model_call_details`) that litellm threads through the
input/success/failure callbacks, instead of in the forwarded `metadata`
sub-dict. This ties the span's lifetime to the request with no module-level
tracking, mirroring how the clickhouse/dramatiq integrations stash a span on
their per-request object. The Anthropic request body is built only from
recognized request params, not from `model_call_details`, so the span is
never serialized onto the wire (verified end-to-end against the passthrough).

Fixes getsentry#6596

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jgreer013 jgreer013 force-pushed the jgreer013/issue-6596 branch from 986f979 to dbb2139 Compare June 18, 2026 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LiteLLMIntegration stores a non-JSON-serializable Span in request metadata, breaking outbound LLM requests

2 participants