Skip to content

RUM-16502: Dedup fatal ANR reported by Profiling pipeline#3479

Merged
ambushwork merged 1 commit into
feature/continuous-profilingfrom
yl/profiling/anr-deduplication
May 29, 2026
Merged

RUM-16502: Dedup fatal ANR reported by Profiling pipeline#3479
ambushwork merged 1 commit into
feature/continuous-profilingfrom
yl/profiling/anr-deduplication

Conversation

@ambushwork
Copy link
Copy Markdown
Member

@ambushwork ambushwork commented May 27, 2026

What does this PR do?

Dedups fatal ANRs so the late-crash reporter does not re-emit an ANR that was already reported in-session — covering both the new Profiling pipeline path and the existing ANRDetectorRunnable path.

The marker write now lives in RumViewScope.onAddError: when an AddError event with an ANRException throwable is successfully written, lastFatalAnrSent is persisted (using event.eventTime.timestamp, which is device time and matches ApplicationExitInfo.timestamp). On the next launch, DatadogLateCrashReporter.handleAnrCrash skips re-emitting when the marker is within a 10s tolerance of ApplicationExitInfo.timestamp (the two come from different moments — in-process detection vs. process death — so exact equality won't hold).

Hooking at the write path (rather than at each call site) means:

  • Background ANRs that are silently dropped by RumViewManagerScope.handleBackgroundEvent no longer write the marker → no false-negative on background-fatal cases.
  • Both the Profiling path and the legacy ANRDetectorRunnable path are covered by the same hook → the pre-existing duplicate where an in-session foreground ANR turned fatal is also fixed.

Motivation

RUM-16502 — without this, an ANR detected and uploaded by the profiling pipeline is duplicated as a fatal ANR error in RUM on the next launch. The same gap also existed in the legacy path (foreground ANR reported in-session, user closes the ANR dialog, fatal ANR re-reported next launch) and is closed by the same fix.

Additional Notes

Possible regression to the existing late-crash dedup: the previous check compared ApplicationExitInfo.timestamp to itself (marker written by handleAnrCrash, read on next launch — exact equality always held). With the 10s tolerance, two distinct fatal ANRs whose ApplicationExitInfo.timestamp values fall within 10s of each other would now be treated as duplicates.

In practice this requires: ANR → process death → user relaunches the app → second ANR → process death, all within 10s of the first death. The relaunch alone almost always takes longer than 10s, so this collision is only realistic in a tight crash loop, where collapsing the two reports is arguably acceptable. The same-source case (one exit-info entry re-read across multiple launches) still dedups identically since `|0| <= 10_000`.

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Make sure you discussed the feature or bugfix with the maintaining team in an Issue
  • Make sure each commit and the PR mention the Issue number (cf the CONTRIBUTING doc)

@datadog-prod-us1-6

This comment has been minimized.

@ambushwork ambushwork force-pushed the yl/profiling/anr-deduplication branch 2 times, most recently from d28368b to 12b8bd5 Compare May 27, 2026 14:58
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 27, 2026

Codecov Report

❌ Patch coverage is 83.33333% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 72.36%. Comparing base (389f702) to head (18aa23b).

Files with missing lines Patch % Lines
.../android/rum/internal/domain/scope/RumViewScope.kt 75.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@                       Coverage Diff                        @@
##           feature/continuous-profiling    #3479      +/-   ##
================================================================
- Coverage                         72.36%   72.36%   -0.00%     
================================================================
  Files                               979      979              
  Lines                             36242    36246       +4     
  Branches                           6034     6036       +2     
================================================================
+ Hits                              26225    26227       +2     
- Misses                             8359     8366       +7     
+ Partials                           1658     1653       -5     
Files with missing lines Coverage Δ
...kotlin/com/datadog/android/core/InternalSdkCore.kt 100.00% <ø> (ø)
...g/android/rum/internal/DatadogLateCrashReporter.kt 90.29% <100.00%> (+0.05%) ⬆️
.../android/rum/internal/domain/scope/RumViewScope.kt 94.79% <75.00%> (-0.08%) ⬇️

... and 40 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ambushwork
Copy link
Copy Markdown
Member Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 12b8bd5faf

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ambushwork ambushwork marked this pull request as ready for review May 28, 2026 10:14
@ambushwork ambushwork requested review from a team as code owners May 28, 2026 10:14
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 12b8bd5faf

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ambushwork ambushwork force-pushed the yl/profiling/anr-deduplication branch from 12b8bd5 to d05b2fe Compare May 28, 2026 13:47
Log.i("DatadogSynthetics", "$key=$value")
}

private fun persistLastFatalAnrSent(timestampMs: Long) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this method is already called on the I/O (writeScope) thread, no need to push to the persistence executor explicitly.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the function, call it directly in onSuccess callback

onSuccess { it.eventSent(rumContext.viewId.orEmpty(), StorageEvent.Error()) }
onSuccess {
it.eventSent(rumContext.viewId.orEmpty(), StorageEvent.Error())
if (event.throwable is ANRException) {
Copy link
Copy Markdown
Member

@0xnm 0xnm May 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is one concern: if customer has trackNonFatalAnrs, such ANR will also pass here. I'm not sure if deduplication is useful for ANR coming from the ANRDetectorRunnable, because there we have many false positives (and probably we could also deduplicate between ANRDetectorRunnable and ANR trigger), but there is no information about ANR origin, so I guess we can live with it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is a trade-off, onSuccess is the most reliable place that we can mark the ANR is written, but here it already lost the information about the ANR origin. Checking if it is from Profiling trigger here can make the code quite messy

@ambushwork ambushwork requested review from 0xnm and hamorillo May 28, 2026 16:37
@ambushwork ambushwork force-pushed the yl/profiling/anr-deduplication branch from d05b2fe to 18aa23b Compare May 28, 2026 16:44
@hamorillo
Copy link
Copy Markdown
Contributor

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. What shall we delve into next?

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ambushwork ambushwork merged commit 18b2f13 into feature/continuous-profiling May 29, 2026
27 checks passed
@ambushwork ambushwork deleted the yl/profiling/anr-deduplication branch May 29, 2026 14:34
Copy link
Copy Markdown
Contributor

@hamorillo hamorillo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🚀

* Writes timestamp of the last fatal ANR sent.
* Writes the timestamp of the last ANR RUM reported. Despite the "fatal" in the name,
* this same timestamp is also written for non-fatal ANRs (see
* `RumViewScope.persistLastNonFatalAnrSent`) so that the fatal-ANR pipeline, which
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:
persistLastNonFatalAnrSent no longer exits

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants