fix(sdk): fix deadlock in TrustedHttpContextProvider and improve block_on for current-thread runtimes by lklimek · Pull Request #3490 · dashpay/platform

lklimek · 2026-04-14T13:17:56Z

Issue being fixed or feature implemented

Fixes #3432 — deadlock in TrustedHttpContextProvider::get_quorum_public_key() when called from within a tokio runtime.

What was done?

Root cause

get_quorum_public_key is a sync ContextProvider method called during proof verification, which happens inside an async context. The original implementation called futures::executor::block_on to drive the async quorum fetch. This deadlocks because:

futures::executor::block_on parks the calling thread waiting for the future
The future uses reqwest HTTP, which needs tokio's I/O reactor
The reactor lives on the same thread that is now parked → infinite hang

A subsequent fix attempted tokio::task::block_in_place, which panics on current-thread runtimes (no other worker threads to steal the task).

Fix

TrustedHttpContextProvider (rs-sdk-trusted-context-provider): replaced the broken async bridge with std::thread::scope. A scoped OS thread is spawned with its own independent current_thread tokio runtime. The future runs on that thread's reactor, completely isolated from the outer runtime. The scope guarantees the thread joins before returning, so non-'static borrows of self are safe.

dash_sdk::sync::block_on (rs-sdk): upgraded to handle all runtime flavors:

No active runtime: creates a temporary current_thread runtime and drives the future directly
CurrentThread runtime: spawns a dedicated OS thread with its own runtime (block_in_place panics here)
Multi-thread and future flavors (_): existing block_in_place + spawn (unchanged, still optimal)

RuntimeFlavor is #[non_exhaustive] so the wildcard arm is required; the CurrentThread arm explicitly handles the problematic case.

Regression tests

test_get_quorum_public_key_no_deadlock_inside_tokio_runtime: spins up a mock HTTP server and calls get_quorum_public_key from inside a current_thread runtime. Uses a 5-second channel timeout to detect hangs. Verified: old code times out after exactly 5 s; fix completes in ~30 ms.
test_block_on_fails_on_current_thread_runtime (renamed to reflect fixed behaviour): runs the same async→sync→async nesting on a current_thread runtime and asserts Ok("Success").

How Has This Been Tested?

cargo test -p rs-sdk-trusted-context-provider — 9 tests pass including the deadlock regression test
cargo test -p dash-sdk --lib sync — 45 tests pass including the current-thread regression test
Regression test manually verified against pre-fix code (times out after 5 s)

Breaking Changes

None.

Checklist:

I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have added or updated relevant unit/integration/functional/e2e tests
I have added "!" to the title and described breaking changes in the corresponding section if my code contains any
I have made corresponding changes to the documentation if needed

_{🤖 Co-authored by Claudius the Magnificent AI Agent}

Summary by CodeRabbit

Bug Fixes
- Fixed deadlock and panic issues when calling synchronous APIs from async contexts.
- Simplified prefetch initialization by removing preliminary network connectivity checks.
Chores
- Updated dependencies and runtime handling to improve stability and threading behavior.
Tests
- Added regression tests to prevent deadlocks and verify async/sync interactions.

…_in_place `get_quorum_public_key()` called `futures::executor::block_on()` for the cache-miss refetch path, which deadlocks when invoked from inside a tokio runtime (e.g. the FFI wrapper's `runtime.block_on()`). Replace with `tokio::task::block_in_place` + `Handle::current().block_on()` when a tokio context is active, falling back to a temporary `tokio::runtime::Runtime` when no context is present. Also remove diagnostic HTTP requests to google.com and the raw quorums endpoint from `dash_sdk_create_trusted` — they were debugging artifacts that should not be in production code. Closes #3432 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ide tokio Reproduce the scenario from issue #3432: a cache-miss refetch of a quorum public key called from within a multi-thread tokio runtime context. A minimal in-process HTTP server serves the /quorums response so no external network access is needed. The test would have deadlocked with the old `futures::executor::block_on` approach. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…m_public_key Replace futures::executor::block_on with a scoped OS thread running its own current-thread tokio runtime. This fixes two failure modes: 1. futures::executor::block_on deadlocks inside a current-thread tokio runtime: the single thread is both executor and I/O driver; blocking it for block_on starves the reactor and the reqwest future never completes. 2. tokio::task::block_in_place panics on current-thread runtimes. std::thread::scope spawns a new OS thread with no tokio context; the thread creates its own current-thread runtime, runs find_quorum to completion, and the scope ensures the thread is joined before get_quorum_public_key returns. Borrowing self across the scope boundary is safe because the scope exits only after the thread joins. The overhead (thread + runtime creation per cache miss) is acceptable because quorum cache misses are rare. Also updates the regression test: it now uses a current-thread runtime (reproducing the exact deadlock scenario) with a 5s timeout. The old futures::executor::block_on code was verified to trigger the timeout. Closes #3432 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

block_on previously called tokio::task::block_in_place unconditionally, which panics on current-thread runtimes. Now it detects the runtime flavor and branches accordingly: - No runtime: creates a temporary current-thread runtime - CurrentThread: spawns a dedicated OS thread with its own runtime - MultiThread (and future flavors): uses block_in_place as before Fixes #3432 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ldcard arm RuntimeFlavor is #[non_exhaustive], so _ already covers MultiThread and any future multi-threaded variants. Deduplicate the identical arms. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…:block_on Copies the runtime-aware block_on utility into the trusted context provider so it is available for future use without a shared-crate dependency. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace Err(_) with Err(e) and include the error in the trace message so the reason no runtime was found is not silently discarded. Also reverts the block_on copy added to trusted-context-provider. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai · 2026-04-14T13:18:05Z

Warning

Rate limit exceeded

@lklimek has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 18 minutes and 4 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 18 minutes and 4 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fcf8e7d0-7058-4a93-9c93-4e55ba0ac001

📥 Commits

Reviewing files that changed from the base of the PR and between 52bac76 and 0d7e82c.

📒 Files selected for processing (1)

packages/rs-sdk-trusted-context-provider/Cargo.toml

📝 Walkthrough

Walkthrough

Replaces blocking futures executor usage inside Tokio runtimes with thread-isolated current-thread Tokio runtimes, removes diagnostic reqwest calls from the FFI prefetch task, swaps futures for tokio where needed, and updates block_on behavior to handle no-runtime, current-thread, and multi-thread contexts safely.

Changes

Cohort / File(s)	Summary
Dependency Cleanup `packages/rs-sdk-ffi/Cargo.toml`, `packages/rs-sdk-trusted-context-provider/Cargo.toml`	Removed `reqwest` from FFI crate and replaced `futures` with `tokio = { version = "1.40", features = ["rt","rt-multi-thread"] }` in the trusted-context-provider crate.
FFI SDK Initialization `packages/rs-sdk-ffi/src/sdk.rs`	Removed preliminary diagnostic HTTP GET checks and associated logging from the async prefetch task; now invokes `update_quorum_caches().await` directly.
Context Provider Runtime Handling `packages/rs-sdk-trusted-context-provider/src/provider.rs`	Replaced `futures::executor::block_on(self.find_quorum(...))` with spawning an OS thread that builds a current-thread Tokio runtime and `block_on`s the future; added error mapping for runtime creation and thread join panics; added regression test to prevent deadlock.
Sync Utility Runtime Branching `packages/rs-sdk/src/sync.rs`	Reworked `block_on` to detect runtime state and handle three cases: no active runtime (create temporary current-thread runtime), current-thread runtime (spawn OS thread with its own current-thread runtime), and other runtimes (use `block_in_place` + task bridging); added docs and regression test.

Sequence Diagram(s)

sequenceDiagram
  participant App
  participant FFI_SDK as FFI\ SDK
  participant PrefetchTask as Prefetch\ Task
  participant Provider as TrustedContextProvider
  participant Thread as Quorum\ Thread
  participant TokioRT as Tokio\ Current-Thread\ Runtime
  participant QuorumHTTP as Quorum\ HTTP\ Service

  App->>FFI_SDK: create trusted SDK (returns handle)
  FFI_SDK->>PrefetchTask: spawn async prefetch (detached)
  PrefetchTask->>Provider: update_quorum_caches().await
  Note over App,Provider: Later, App requests proof verification
  App->>Provider: get_quorum_public_key(hash)
  alt cache hit
    Provider-->>App: return key
  else cache miss
    Provider->>Thread: spawn OS thread to run async fetch
    Thread->>TokioRT: build current-thread runtime
    TokioRT->>QuorumHTTP: perform HTTP requests (reqwest)
    QuorumHTTP-->>TokioRT: respond with quorum data
    TokioRT-->>Thread: return result
    Thread-->>Provider: join and deliver key/result
    Provider-->>App: return key or error
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 I tunneled through runtimes tight and deep,
Spawned a thread where async dreams could sleep,
No more deadlocks in the midnight code,
Prefetch hops along the safer road,
The SDK leaps onward—light and fleet! 🥕

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main changes: fixing a deadlock in TrustedHttpContextProvider and improving block_on for current-thread runtimes, which aligns with the primary objective to eliminate deadlock caused by futures::executor::block_on inside a tokio runtime.
Linked Issues check	✅ Passed	The PR fully addresses issue `#3432` objectives: eliminates the deadlock by replacing futures::executor::block_on with std::thread::scope, extends block_on to handle no-runtime and current-thread runtime cases, adds regression tests using mock HTTP servers, and removes diagnostic external network requests.
Out of Scope Changes check	✅ Passed	All code changes are directly scoped to fixing issue `#3432`: TrustedContextProvider deadlock fix, block_on improvements, removal of reqwest/diagnostic HTTP requests, and related regression tests.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/trusted-context-provider-deadlock

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

thepastaclaw · 2026-04-14T13:22:09Z

⏳ Review in progress (commit 0d7e82c)

github-actions · 2026-04-14T13:32:48Z

✅ DashSDKFFI.xcframework built for this PR.

Workflow run: https://github.com/dashpay/platform/actions/runs/24442398150
Artifacts: DashSDKFFI.xcframework (folder), DashSDKFFI.xcframework.zip, xc_checksum.txt

SwiftPM (host the zip at a stable URL, then use):

.binaryTarget(
  name: "DashSDKFFI",
  url: "https://your.cdn.example/DashSDKFFI.xcframework.zip",
  checksum: "cbe56c4ea36c4a584ba597bba07790f1a661bf641d21fea6f2cf0e14514024fe"
)

Xcode manual integration:

Download 'DashSDKFFI.xcframework' artifact from the run link above.
Drag it into your app target (Frameworks, Libraries & Embedded Content) and set Embed & Sign.
If using the Swift wrapper package, point its binaryTarget to the xcframework location or add the package and place the xcframework at the expected path.

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/rs-sdk-trusted-context-provider/Cargo.toml`:
- Line 25: The Tokio dev-dependency in Cargo.toml only enables
["macros","rt-multi-thread"] but the crate's tests use tokio networking and
async IO APIs (e.g., tokio::net::TcpListener and tokio::io::{AsyncBufReadExt,
AsyncWriteExt, BufReader}), so update the tokio entry in the [dev-dependencies]
section to include the "net" and "io-util" features (i.e., add "net" and
"io-util" alongside "macros" and "rt-multi-thread") so those APIs are explicitly
enabled for tests.

In `@packages/rs-sdk-trusted-context-provider/src/provider.rs`:
- Around line 915-968: The test spawns two detached threads (the server thread
using server_rt and the client thread that builds rt and calls
provider.get_quorum_public_key) and never joins them or signals the server loop
to stop, leaving a live Tokio runtime; modify the test to keep the JoinHandles,
add a shutdown path for the server loop (e.g., a oneshot or atomic flag the
server loop checks after serving the request or break after a single accept),
signal shutdown once the client has finished (use rx to wait for the result),
then join both thread handles to ensure the server_rt and rt runtimes are
dropped and their threads exit cleanly; reference the spawned closures around
listener.accept(), the server_rt.block_on(...) thread, and the thread that
creates rt and calls provider.get_quorum_public_key so you can locate where to
add the channel/flag, signal, and handle.join().

In `@packages/rs-sdk/src/sync.rs`:
- Around line 89-104: In RuntimeFlavor::CurrentThread branch the worker thread
currently only sends the successful fut output via tx and drops errors/panics,
so callers get only AsyncError::RecvError; change the spawn to return a
JoinHandle and send a Result<F::Output, AsyncError> over tx (wrap build runtime
errors and any panic/worker failures into AsyncError), ensure you call join on
the handle and propagate the joined thread error if it panicked or the runtime
build failed, and have rx.recv() return that Result so the caller receives the
real failure instead of a generic recv error.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 24e588b1-2b8a-4e32-9920-9e8d1542f4d7

📥 Commits

Reviewing files that changed from the base of the PR and between 127ad5e and 2a77e1c.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (5)

packages/rs-sdk-ffi/Cargo.toml
packages/rs-sdk-ffi/src/sdk.rs
packages/rs-sdk-trusted-context-provider/Cargo.toml
packages/rs-sdk-trusted-context-provider/src/provider.rs
packages/rs-sdk/src/sync.rs

💤 Files with no reviewable changes (1)

packages/rs-sdk-ffi/Cargo.toml

…est threads - sync.rs: Send Result<T, AsyncError> over channel in CurrentThread branch so runtime build failures and panics propagate as real errors instead of opaque RecvError. - provider.rs: Add oneshot shutdown signal to mock server, capture and join both thread handles so no detached threads or live runtimes survive the test. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…lity Production code only uses tokio::runtime::Builder::new_current_thread() which needs the "rt" feature alone. The "rt-multi-thread", "net", and "io-util" features are only used in tests, so move them to [dev-dependencies] to avoid breaking the wasm-sdk build. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

Fixes a deadlock/panic when synchronous SDK APIs bridge into async work (reqwest/tokio) from within an existing tokio runtime, particularly on current-thread runtimes used by FFI consumers.

Changes:

Updated dash_sdk::sync::block_on to handle “no runtime”, “current-thread runtime”, and “multi-thread runtime” cases safely.
Reworked TrustedHttpContextProvider::get_quorum_public_key to perform async quorum fetches on a dedicated OS thread with its own tokio runtime to avoid deadlocks.
Added regression tests for both the trusted context provider deadlock and block_on behavior; removed diagnostic HTTP checks and the related reqwest dependency from the FFI crate.

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
packages/rs-sdk/src/sync.rs	Runtime-flavor-aware `block_on` implementation + regression test for current-thread runtimes.
packages/rs-sdk-trusted-context-provider/src/provider.rs	Avoid deadlock by running quorum fetch in a scoped OS thread with an isolated runtime + regression test.
packages/rs-sdk-trusted-context-provider/Cargo.toml	Dependency updates: drop `futures`, add `tokio` (and dev features needed by tests).
packages/rs-sdk-ffi/src/sdk.rs	Removed diagnostic HTTP probes during trusted SDK creation.
packages/rs-sdk-ffi/Cargo.toml	Removed `reqwest` dependency (previously only used for diagnostics).
Cargo.lock	Lockfile updated to reflect dependency changes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-15T07:46:00Z

+    /// This test proves the fix works: the same async-sync-async nesting that used to
+    /// panic now completes successfully on a current-thread runtime.
+    #[test]
+    fn test_block_on_fails_on_current_thread_runtime() {


The test name says it "fails" on a current-thread runtime, but the body and assertions expect success after the fix. Rename the test to reflect the intended behavior (e.g., succeeds/does_not_panic) so failures are easier to interpret.

Suggested change

fn test_block_on_fails_on_current_thread_runtime() {

fn test_block_on_succeeds_on_current_thread_runtime() {

Copilot · 2026-04-15T07:46:01Z

+
+        let result = rt.block_on(outer(rx));
+
+        rt.block_on(worker_join).ok();


This ignores the JoinHandle result from the worker task. If the task panics or is cancelled, the test will still pass/fail later in a misleading way. Prefer awaiting it with unwrap()/asserting is_ok() so task failures are surfaced deterministically.

Suggested change

rt.block_on(worker_join).ok();

rt.block_on(worker_join).unwrap();

Copilot · 2026-04-15T07:46:01Z

+            let recv_result = rx.recv()?;
+            join_handle
+                .join()
+                .map_err(|_| AsyncError::Generic("block_on worker thread panicked".to_string()))?;
+            recv_result


In the CurrentThread branch, if rx.recv() returns early (e.g., worker panics before sending), the function returns via ? and the JoinHandle is dropped without joining. That can hide panics and makes failures harder to diagnose. Consider always joining the thread (even on recv error) and mapping a panic to an explicit AsyncError::Generic("... panicked") (or similar) so callers get a clearer error.

Suggested change

let recv_result = rx.recv()?;

join_handle

.join()

.map_err(|_| AsyncError::Generic("block_on worker thread panicked".to_string()))?;

recv_result

let recv_result = rx.recv();

let join_result = join_handle.join();

if join_result.is_err() {

return Err(AsyncError::Generic(

"block_on worker thread panicked".to_string(),

));

}

recv_result?

lklimek · 2026-04-15T13:32:23Z

Superseded by #3497

lklimek and others added 7 commits April 14, 2026 11:52

refactor(sdk): simplify block_on match by merging MultiThread into wi…

b17e145

…ldcard arm RuntimeFlavor is #[non_exhaustive], so _ already covers MultiThread and any future multi-threaded variants. Deduplicate the identical arms. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

feat(trusted-provider): add block_on helper mirroring dash_sdk::sync:…

3e48c8d

…:block_on Copies the runtime-aware block_on utility into the trusted context provider so it is available for future use without a shared-crate dependency. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions bot added this to the v3.1.0 milestone Apr 14, 2026

lklimek marked this pull request as ready for review April 14, 2026 13:19

lklimek requested review from QuantumExplorer and shumkov as code owners April 14, 2026 13:19

coderabbitai bot reviewed Apr 14, 2026

View reviewed changes

Comment thread packages/rs-sdk-trusted-context-provider/Cargo.toml Outdated

Comment thread packages/rs-sdk-trusted-context-provider/src/provider.rs Outdated

Comment thread packages/rs-sdk/src/sync.rs Outdated

lklimek and others added 2 commits April 15, 2026 08:56

lklimek requested a review from Copilot April 15, 2026 07:38

Copilot started reviewing on behalf of lklimek April 15, 2026 07:40 View session

Copilot AI reviewed Apr 15, 2026

View reviewed changes

lklimek mentioned this pull request Apr 15, 2026

feat(dpp)!: convert Signer trait to async #3492

Open

5 tasks

lklimek marked this pull request as draft April 15, 2026 12:41

lklimek closed this Apr 15, 2026

	fn test_block_on_fails_on_current_thread_runtime() {
	fn test_block_on_succeeds_on_current_thread_runtime() {


		let result = rt.block_on(outer(rx));

		rt.block_on(worker_join).ok();

	rt.block_on(worker_join).ok();
	rt.block_on(worker_join).unwrap();

-            let recv_result = rx.recv()?;
-            join_handle
-                .join()
-                .map_err(|_| AsyncError::Generic("block_on worker thread panicked".to_string()))?;
-            recv_result
+            let recv_result = rx.recv();
+            let join_result = join_handle.join();
+            if join_result.is_err() {
+                return Err(AsyncError::Generic(
+                    "block_on worker thread panicked".to_string(),
+                ));
+            }
+            recv_result?

Conversation

lklimek commented Apr 14, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue being fixed or feature implemented

What was done?

Root cause

Fix

Regression tests

How Has This Been Tested?

Breaking Changes

Checklist:

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

thepastaclaw commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

lklimek commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lklimek commented Apr 14, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 14, 2026 •

edited

Loading

thepastaclaw commented Apr 14, 2026 •

edited

Loading

github-actions bot commented Apr 14, 2026 •

edited

Loading