Skip to content

fix(replicate): record errors on spans#4023

Open
ffenjil wants to merge 3 commits intotraceloop:mainfrom
ffenjil:fix/replicate-record-errors-on-spans
Open

fix(replicate): record errors on spans#4023
ffenjil wants to merge 3 commits intotraceloop:mainfrom
ffenjil:fix/replicate-record-errors-on-spans

Conversation

@ffenjil
Copy link
Copy Markdown

@ffenjil ffenjil commented Apr 17, 2026

Wrap replicate instrumentation calls in try/except to properly record exceptions on spans. On error, sets error.type attribute, records the exception event, and sets span status to ERROR before re-raising.

Adds two tests verifying error handling for both run and stream paths.

Closes #412

  • I have added tests that cover my changes.
  • If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
  • PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
  • (If applicable) I have updated the documentation accordingly.

Summary by CodeRabbit

Release Notes

  • New Features

    • Improved observability for failures with semantic error attributes and comprehensive exception tracking across all request types.
    • Enhanced error telemetry captures detailed status information and error descriptions for better debugging.
  • Tests

    • Added tests verifying error handling and telemetry recording for streaming and standard operations.

Wrap replicate instrumentation calls in try/except to properly record
exceptions on spans. On error, sets error.type attribute, records the
exception event, and sets span status to ERROR before re-raising.

Adds two tests verifying error handling for both run and stream paths.

Closes traceloop#412
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 17, 2026

CLA assistant check
All committers have signed the CLA.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 17, 2026

📝 Walkthrough

Walkthrough

The changes add comprehensive error telemetry to the Replicate instrumentation by recording exceptions, setting error status codes on spans, and ensuring spans are properly closed even when errors occur during execution or streaming operations.

Changes

Cohort / File(s) Summary
Error Telemetry Implementation
packages/opentelemetry-instrumentation-replicate/opentelemetry/instrumentation/replicate/__init__.py
Added try/except/finally blocks to _wrap and _build_from_streaming_response methods to capture exceptions, set ERROR_TYPE semantic attribute, record exception telemetry, set StatusCode.ERROR, and guarantee span closure before re-raising errors.
Error Handling Tests
packages/opentelemetry-instrumentation-replicate/tests/test_llama.py
Added imports for mock patching and StatusCode. Introduced two new exception test cases (test_replicate_run_exception, test_replicate_stream_exception) that verify error spans contain correct status codes, exception events, error type attributes, and error messages.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Errors once hid in the shadows so deep,
Now telemetry tracks them while systems don't sleep,
Spans close with grace, though exceptions may fly,
The Replicate instrumentation improved its reply! 🎯

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(replicate): record errors on spans' directly describes the main change—adding error recording to spans in the replicate instrumentation.
Linked Issues check ✅ Passed The PR fully addresses issue #412 by implementing error logging for spans when HTTP errors occur, with try/except wrapping and proper error status/attributes set.
Out of Scope Changes check ✅ Passed All changes are scoped to the replicate instrumentation package to implement error recording on spans as specified in #412; no extraneous modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-replicate/opentelemetry/instrumentation/replicate/__init__.py (1)

63-79: Looks correct; consider extracting the error-recording block.

The try/except/finally correctly handles normal completion, iterator errors, and GeneratorExit/GC (span is always ended). The same four-line error-recording sequence is duplicated in _wrap (lines 154-157). Consider extracting a small helper to keep the two sites in sync as instrumentation evolves.

♻️ Suggested helper
+def _record_exception_on_span(span, e):
+    span.set_attribute(ERROR_TYPE, e.__class__.__name__)
+    span.record_exception(e)
+    span.set_status(Status(StatusCode.ERROR, str(e)))
+
+
 def _build_from_streaming_response(span, event_logger, response):
     complete_response = ""
     try:
         for item in response:
             item_to_yield = item
             complete_response += str(item)

             yield item_to_yield

         _handle_response(span, event_logger, complete_response)
     except Exception as e:
-        span.set_attribute(ERROR_TYPE, e.__class__.__name__)
-        span.record_exception(e)
-        span.set_status(Status(StatusCode.ERROR, str(e)))
+        _record_exception_on_span(span, e)
         raise
     finally:
         span.end()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-replicate/opentelemetry/instrumentation/replicate/__init__.py`
around lines 63 - 79, Extract the duplicated four-line error recording logic
into a helper (e.g., _record_span_error) and call it from both
_build_from_streaming_response and _wrap: implement _record_span_error(span,
exc) to call span.set_attribute(ERROR_TYPE, exc.__class__.__name__),
span.record_exception(exc), and span.set_status(Status(StatusCode.ERROR,
str(exc))); then replace the duplicated lines in _build_from_streaming_response
and the identical block in _wrap with a call to _record_span_error(span, e).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In
`@packages/opentelemetry-instrumentation-replicate/opentelemetry/instrumentation/replicate/__init__.py`:
- Around line 63-79: Extract the duplicated four-line error recording logic into
a helper (e.g., _record_span_error) and call it from both
_build_from_streaming_response and _wrap: implement _record_span_error(span,
exc) to call span.set_attribute(ERROR_TYPE, exc.__class__.__name__),
span.record_exception(exc), and span.set_status(Status(StatusCode.ERROR,
str(exc))); then replace the duplicated lines in _build_from_streaming_response
and the identical block in _wrap with a call to _record_span_error(span, e).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b9fb01d4-3a2c-4c20-b3a1-5d613a57dcd7

📥 Commits

Reviewing files that changed from the base of the PR and between dcd5cb0 and 08b1ac2.

⛔ Files ignored due to path filters (1)
  • packages/opentelemetry-instrumentation-replicate/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (2)
  • packages/opentelemetry-instrumentation-replicate/opentelemetry/instrumentation/replicate/__init__.py
  • packages/opentelemetry-instrumentation-replicate/tests/test_llama.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

🐛 Bug Report: Errors are not logged

2 participants