Skip to content

fix: replace deadlocking futures::executor::block_on with runtime-aware dash-async crate (#3432)#3497

Open
lklimek wants to merge 2 commits intov3.1-devfrom
feat/rs-dash-async
Open

fix: replace deadlocking futures::executor::block_on with runtime-aware dash-async crate (#3432)#3497
lklimek wants to merge 2 commits intov3.1-devfrom
feat/rs-dash-async

Conversation

@lklimek
Copy link
Copy Markdown
Contributor

@lklimek lklimek commented Apr 15, 2026

Issue being fixed or feature implemented

Fixes #3432futures::executor::block_on inside a tokio runtime in TrustedHttpContextProvider causes a deadlock when quorum cache misses trigger async HTTP fetches.

Root cause (from #3432)

ContextProvider::get_quorum_public_key() is synchronous, but on cache miss it calls futures::executor::block_on(self.find_quorum(...)) which makes HTTP requests via reqwest. Nesting futures::executor::block_on inside a tokio runtime deadlocks on single-threaded runtimes or panics on multi-threaded runtimes. This affects the iOS SDK and any FFI consumer using the trusted context provider.

What was done?

New crate: packages/rs-dash-async (dash-async)

Extracted a runtime-aware block_on function that correctly handles all three scenarios:

  • No active runtime → creates a temporary current-thread runtime
  • Current-thread runtime → spawns a dedicated OS thread with its own runtime (avoids block_in_place panic)
  • Multi-thread runtime → uses block_in_place + spawn for efficient bridging

Also includes:

rs-sdk-trusted-context-provider — the fix

dash-sdk::sync — backward compatible

  • Re-exports block_on and AsyncError from dash-async
  • All existing use dash_sdk::sync::block_on imports continue to work unchanged
  • retry function and its tests remain in dash-sdk

rs-context-provider

  • Moved From<AsyncError> for ContextProviderError impl here (orphan rule — AsyncError is no longer local to dash-sdk)

How Has This Been Tested?

  • cargo test -p dash-async — 2 tests pass:
  • cargo test -p dash-sdk --lib -- sync::test — 40 retry tests pass
  • cargo test -p rs-sdk-trusted-context-provider — 8 tests pass
  • cargo check succeeds for all affected crates

Breaking Changes

None. All public APIs are backward compatible via re-exports.

Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added or updated relevant unit/integration/functional/e2e tests
  • I have added "!" to the title and described breaking changes in the corresponding section if my code contains any
  • I have made corresponding changes to the documentation if needed

🤖 Co-authored by Claudius the Magnificent AI Agent

Summary by CodeRabbit

Release Notes

  • New Features

    • Added async-to-sync bridging utility with support for multiple runtime configurations and improved error handling.
  • Refactor

    • Consolidated async-to-sync functionality into a dedicated module for improved reusability across the SDK.
    • Enhanced error handling and reporting for async operations.

lklimek and others added 2 commits April 15, 2026 14:58
The initial extraction only kept the multi-thread block_in_place path.
Restore the 3-way handling: no runtime (create temporary), current-thread
(spawn OS thread), multi-thread (block_in_place). Also restore the
current-thread runtime regression test and add missing tokio sync feature
for dev-dependencies.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 15, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a9b768db-cc36-4a0e-8f29-814c4df16743

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • ✅ Review completed - (🔄 Check again to review again)
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/rs-dash-async

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added this to the v3.1.0 milestone Apr 15, 2026
@lklimek lklimek changed the title feat: extract rs-dash-async crate from dash-sdk sync module fix: replace deadlocking futures::executor::block_on with runtime-aware dash-async crate (#3432) Apr 15, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (4)
packages/rs-dash-async/src/block_on.rs (2)

87-97: Minor inconsistency in channel type selection.

The current-thread path uses sync_channel::<...>(1) while the multi-thread path uses unbounded channel(). Both work correctly since only one message is sent, but using sync_channel(1) consistently would be slightly more explicit about the expected single-message behavior.

This is a very minor style observation and doesn't affect correctness.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/rs-dash-async/src/block_on.rs` around lines 87 - 97, In block_on's
multi-thread branch (the match arm using handle.spawn(worker(fut, tx))), replace
the unbounded std::sync::mpsc::channel() with a bounded
std::sync::mpsc::sync_channel(1) to mirror the current-thread path and make the
single-message intent explicit; ensure you create tx/rx with the same generic
type as the other branch (e.g., sync_channel::<...>(1)) so the spawned
worker(fut, tx), rx.recv(), and the hdl handling remain type-compatible.

209-216: Test name is misleading - it tests success, not failure.

The test test_block_on_fails_on_current_thread_runtime actually verifies that block_on succeeds on a current-thread runtime (the fix for the bug). Consider renaming to something like test_block_on_succeeds_on_current_thread_runtime to match the assertion on line 259-263.

✏️ Suggested rename
-    fn test_block_on_fails_on_current_thread_runtime() {
+    fn test_block_on_succeeds_on_current_thread_runtime() {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/rs-dash-async/src/block_on.rs` around lines 209 - 216, The test
function name test_block_on_fails_on_current_thread_runtime is misleading
because the body asserts success; rename the test function to
test_block_on_succeeds_on_current_thread_runtime and update any references (doc
comment or CI/test filters) accordingly so the name matches the assertion and
comment describing the regression fix in the surrounding block_on test code.
packages/rs-sdk-trusted-context-provider/src/provider.rs (2)

631-646: Consider extracting public key parsing into a helper method.

The public key parsing logic (hex decode, length validation, array conversion) is duplicated three times in get_quorum_public_key: once for the current cache hit (lines 568-584), once for the previous cache hit (lines 594-610), and here for the refetch path.

♻️ Proposed helper extraction
impl TrustedHttpContextProvider {
    /// Parse a BLS public key from hex string
    fn parse_quorum_public_key(key: &str) -> Result<[u8; 48], ContextProviderError> {
        let pubkey_hex = key.trim_start_matches("0x");
        let pubkey_bytes = hex::decode(pubkey_hex).map_err(|e| {
            ContextProviderError::Generic(format!("Invalid hex in public key: {}", e))
        })?;

        if pubkey_bytes.len() != 48 {
            return Err(ContextProviderError::Generic(format!(
                "Invalid public key length: {} bytes, expected 48",
                pubkey_bytes.len()
            )));
        }

        pubkey_bytes.try_into().map_err(|_| {
            ContextProviderError::Generic("Failed to convert public key to array".to_string())
        })
    }
}

Then each call site becomes:

return Self::parse_quorum_public_key(&quorum.key);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/rs-sdk-trusted-context-provider/src/provider.rs` around lines 631 -
646, Extract the duplicated public-key parsing logic into a new helper method on
TrustedHttpContextProvider (e.g., fn parse_quorum_public_key(key: &str) ->
Result<[u8;48], ContextProviderError>) that performs trim_start_matches("0x"),
hex::decode with ContextProviderError::Generic on decode errors, checks
pubkey_bytes.len() == 48 and converts to [u8;48] returning the same
ContextProviderError::Generic on conversion failure; then replace each
duplicated block in get_quorum_public_key with a call to
Self::parse_quorum_public_key(&quorum.key) to DRY up the code and preserve
existing error behavior.

622-629: Consider using the existing From<AsyncError> conversion.

The error from block_on is manually mapped to ContextProviderError::Generic, but there's already a From<dash_async::AsyncError> for ContextProviderError impl in rs-context-provider/src/error.rs that maps to the semantic AsyncError variant. Using the ? operator with the From impl would preserve the error categorization.

♻️ Proposed refactor to use From impl
         let this = self.clone();
         let quorum =
             dash_async::block_on(async move { this.find_quorum(quorum_type, quorum_hash).await })
-                .map_err(|e| ContextProviderError::Generic(format!("block_on failed: {}", e)))?
+                ?  // Uses From<AsyncError> for ContextProviderError
                 .map_err(|e| {
                     debug!("Error finding quorum: {}", e);
                     ContextProviderError::Generic(format!("Failed to find quorum: {}", e))
                 })?;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/rs-sdk-trusted-context-provider/src/provider.rs` around lines 622 -
629, The block_on error is being manually mapped to
ContextProviderError::Generic, but there is a From<dash_async::AsyncError> ->
ContextProviderError impl; replace the outer map_err mapping with the ? operator
so the dash_async::block_on(...) error converts via From automatically.
Specifically, change the block that calls dash_async::block_on(async move {
this.find_quorum(quorum_type, quorum_hash).await }) .map_err(|e|
ContextProviderError::Generic(...))? to simply use ? on the block_on result
(i.e., dash_async::block_on(...)?), then keep the existing inner map_err that
handles the find_quorum application error and debugging log; reference symbols:
block_on, find_quorum, ContextProviderError, and dash_async::AsyncError.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/rs-dash-async/src/block_on.rs`:
- Around line 87-97: In block_on's multi-thread branch (the match arm using
handle.spawn(worker(fut, tx))), replace the unbounded std::sync::mpsc::channel()
with a bounded std::sync::mpsc::sync_channel(1) to mirror the current-thread
path and make the single-message intent explicit; ensure you create tx/rx with
the same generic type as the other branch (e.g., sync_channel::<...>(1)) so the
spawned worker(fut, tx), rx.recv(), and the hdl handling remain type-compatible.
- Around line 209-216: The test function name
test_block_on_fails_on_current_thread_runtime is misleading because the body
asserts success; rename the test function to
test_block_on_succeeds_on_current_thread_runtime and update any references (doc
comment or CI/test filters) accordingly so the name matches the assertion and
comment describing the regression fix in the surrounding block_on test code.

In `@packages/rs-sdk-trusted-context-provider/src/provider.rs`:
- Around line 631-646: Extract the duplicated public-key parsing logic into a
new helper method on TrustedHttpContextProvider (e.g., fn
parse_quorum_public_key(key: &str) -> Result<[u8;48], ContextProviderError>)
that performs trim_start_matches("0x"), hex::decode with
ContextProviderError::Generic on decode errors, checks pubkey_bytes.len() == 48
and converts to [u8;48] returning the same ContextProviderError::Generic on
conversion failure; then replace each duplicated block in get_quorum_public_key
with a call to Self::parse_quorum_public_key(&quorum.key) to DRY up the code and
preserve existing error behavior.
- Around line 622-629: The block_on error is being manually mapped to
ContextProviderError::Generic, but there is a From<dash_async::AsyncError> ->
ContextProviderError impl; replace the outer map_err mapping with the ? operator
so the dash_async::block_on(...) error converts via From automatically.
Specifically, change the block that calls dash_async::block_on(async move {
this.find_quorum(quorum_type, quorum_hash).await }) .map_err(|e|
ContextProviderError::Generic(...))? to simply use ? on the block_on result
(i.e., dash_async::block_on(...)?), then keep the existing inner map_err that
handles the find_quorum application error and debugging log; reference symbols:
block_on, find_quorum, ContextProviderError, and dash_async::AsyncError.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 56379006-6e11-4e56-a33c-7609591618e8

📥 Commits

Reviewing files that changed from the base of the PR and between 85a7117 and eec32ff.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (10)
  • Cargo.toml
  • packages/rs-context-provider/Cargo.toml
  • packages/rs-context-provider/src/error.rs
  • packages/rs-dash-async/Cargo.toml
  • packages/rs-dash-async/src/block_on.rs
  • packages/rs-dash-async/src/lib.rs
  • packages/rs-sdk-trusted-context-provider/Cargo.toml
  • packages/rs-sdk-trusted-context-provider/src/provider.rs
  • packages/rs-sdk/Cargo.toml
  • packages/rs-sdk/src/sync.rs

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes the deadlock caused by using futures::executor::block_on inside a Tokio runtime by introducing a runtime-aware async→sync bridging implementation (dash-async) and migrating the trusted context provider to use it.

Changes:

  • Added new dash-async crate providing a runtime-flavor-aware block_on + AsyncError (including a WASM stub and regression tests for #3432).
  • Updated TrustedHttpContextProvider to use dash_async::block_on (and made it Clone by switching fallback_provider to Arc).
  • Updated dash-sdk::sync to re-export block_on/AsyncError, and moved From<AsyncError> conversion to rs-context-provider.

Reviewed changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
packages/rs-sdk/src/sync.rs Re-exports dash-async bridge types; removes local implementation/tests.
packages/rs-sdk/Cargo.toml Adds dash-async dependency.
packages/rs-sdk-trusted-context-provider/src/provider.rs Replaces futures::executor::block_on with dash_async::block_on; makes provider clonable via Arc fallback.
packages/rs-sdk-trusted-context-provider/Cargo.toml Adds dash-async, removes futures.
packages/rs-dash-async/src/lib.rs New crate entry point exporting block_on + AsyncError.
packages/rs-dash-async/src/block_on.rs Runtime-aware block_on implementation + tests (incl. #3432 regression).
packages/rs-dash-async/Cargo.toml New crate manifest and dependencies.
packages/rs-context-provider/src/error.rs Adds From<dash_async::AsyncError> to map to ContextProviderError::AsyncError.
packages/rs-context-provider/Cargo.toml Adds dash-async dependency.
Cargo.toml Adds rs-dash-async to workspace members.
Cargo.lock Records new crate and dependency graph updates.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +79 to +83
let recv_result = rx.recv()?;
join_handle
.join()
.map_err(|_| AsyncError::Generic("block_on worker thread panicked".to_string()))?;
recv_result
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current-thread runtime branch, rx.recv()? can return RecvError if the worker thread panics before sending. Because of the early ?, the join_handle is never joined, and the panic is effectively swallowed (and the thread is left unjoined). Consider ensuring the worker thread is always joined (even when recv() fails) and prefer returning an error that reflects a worker panic when that happens.

Copilot uses AI. Check for mistakes.
/// runtime flavor and spawns a dedicated OS thread with its own runtime when running
/// on a current-thread scheduler.
#[test]
fn test_block_on_fails_on_current_thread_runtime() {
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test name test_block_on_fails_on_current_thread_runtime contradicts the assertions and doc comment (it expects block_on to succeed on a current-thread runtime). Renaming the test to reflect success would make failures easier to interpret.

Suggested change
fn test_block_on_fails_on_current_thread_runtime() {
fn test_block_on_succeeds_on_current_thread_runtime() {

Copilot uses AI. Check for mistakes.
let this = self.clone();
let quorum =
dash_async::block_on(async move { this.find_quorum(quorum_type, quorum_hash).await })
.map_err(|e| ContextProviderError::Generic(format!("block_on failed: {}", e)))?
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dash_async::block_on returns AsyncError, and ContextProviderError now has a dedicated AsyncError variant with From<dash_async::AsyncError> implemented. Instead of wrapping this as ContextProviderError::Generic("block_on failed: ..."), consider converting via ContextProviderError::from(e) so callers can distinguish async-bridging failures from generic provider errors.

Suggested change
.map_err(|e| ContextProviderError::Generic(format!("block_on failed: {}", e)))?
.map_err(ContextProviderError::from)?

Copilot uses AI. Check for mistakes.
@lklimek lklimek marked this pull request as ready for review April 15, 2026 14:41
@thepastaclaw
Copy link
Copy Markdown
Collaborator

thepastaclaw commented Apr 15, 2026

Review Gate

Commit: eec32ffa

  • Debounce: 2969m ago (need 30m)

  • CI checks: build failure: Build JS packages / Build JS

  • CodeRabbit review: comment found

  • Off-peak hours: peak window (5am-11am PT) — currently 07:31 AM PT Friday

  • Run review now (check to override)

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 15, 2026

Codecov Report

❌ Patch coverage is 50.87719% with 28 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.82%. Comparing base (127ad5e) to head (eec32ff).
⚠️ Report is 2 commits behind head on v3.1-dev.

Files with missing lines Patch % Lines
packages/rs-dash-async/src/block_on.rs 50.87% 28 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           v3.1-dev    #3497      +/-   ##
============================================
- Coverage     84.83%   84.82%   -0.01%     
============================================
  Files          2476     2477       +1     
  Lines        267733   267790      +57     
============================================
+ Hits         227123   227152      +29     
- Misses        40610    40638      +28     
Components Coverage Δ
dpp 81.99% <ø> (ø)
drive 84.21% <ø> (ø)
drive-abci 87.46% <ø> (ø)
sdk ∅ <ø> (∅)
dapi-client ∅ <ø> (∅)
platform-version ∅ <ø> (∅)
platform-value 92.10% <ø> (ø)
platform-wallet ∅ <ø> (∅)
drive-proof-verifier 55.66% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions
Copy link
Copy Markdown
Contributor

✅ DashSDKFFI.xcframework built for this PR.

SwiftPM (host the zip at a stable URL, then use):

.binaryTarget(
  name: "DashSDKFFI",
  url: "https://your.cdn.example/DashSDKFFI.xcframework.zip",
  checksum: "f5554700a84d2ca87ef13180d086aea4599551c2fab837319304bd4e7bba2ccc"
)

Xcode manual integration:

  • Download 'DashSDKFFI.xcframework' artifact from the run link above.
  • Drag it into your app target (Frameworks, Libraries & Embedded Content) and set Embed & Sign.
  • If using the Swift wrapper package, point its binaryTarget to the xcframework location or add the package and place the xcframework at the expected path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Deadlock: futures::executor::block_on inside tokio runtime in TrustedHttpContextProvider

3 participants