fix: replace deadlocking futures::executor::block_on with runtime-aware dash-async crate (#3432)#3497
fix: replace deadlocking futures::executor::block_on with runtime-aware dash-async crate (#3432)#3497
Conversation
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
The initial extraction only kept the multi-thread block_in_place path. Restore the 3-way handling: no runtime (create temporary), current-thread (spawn OS thread), multi-thread (block_in_place). Also restore the current-thread runtime regression test and add missing tokio sync feature for dev-dependencies. Co-Authored-By: Claude Opus 4.6 <[email protected]>
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (4)
packages/rs-dash-async/src/block_on.rs (2)
87-97: Minor inconsistency in channel type selection.The current-thread path uses
sync_channel::<...>(1)while the multi-thread path uses unboundedchannel(). Both work correctly since only one message is sent, but usingsync_channel(1)consistently would be slightly more explicit about the expected single-message behavior.This is a very minor style observation and doesn't affect correctness.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/rs-dash-async/src/block_on.rs` around lines 87 - 97, In block_on's multi-thread branch (the match arm using handle.spawn(worker(fut, tx))), replace the unbounded std::sync::mpsc::channel() with a bounded std::sync::mpsc::sync_channel(1) to mirror the current-thread path and make the single-message intent explicit; ensure you create tx/rx with the same generic type as the other branch (e.g., sync_channel::<...>(1)) so the spawned worker(fut, tx), rx.recv(), and the hdl handling remain type-compatible.
209-216: Test name is misleading - it tests success, not failure.The test
test_block_on_fails_on_current_thread_runtimeactually verifies thatblock_onsucceeds on a current-thread runtime (the fix for the bug). Consider renaming to something liketest_block_on_succeeds_on_current_thread_runtimeto match the assertion on line 259-263.✏️ Suggested rename
- fn test_block_on_fails_on_current_thread_runtime() { + fn test_block_on_succeeds_on_current_thread_runtime() {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/rs-dash-async/src/block_on.rs` around lines 209 - 216, The test function name test_block_on_fails_on_current_thread_runtime is misleading because the body asserts success; rename the test function to test_block_on_succeeds_on_current_thread_runtime and update any references (doc comment or CI/test filters) accordingly so the name matches the assertion and comment describing the regression fix in the surrounding block_on test code.packages/rs-sdk-trusted-context-provider/src/provider.rs (2)
631-646: Consider extracting public key parsing into a helper method.The public key parsing logic (hex decode, length validation, array conversion) is duplicated three times in
get_quorum_public_key: once for the current cache hit (lines 568-584), once for the previous cache hit (lines 594-610), and here for the refetch path.♻️ Proposed helper extraction
impl TrustedHttpContextProvider { /// Parse a BLS public key from hex string fn parse_quorum_public_key(key: &str) -> Result<[u8; 48], ContextProviderError> { let pubkey_hex = key.trim_start_matches("0x"); let pubkey_bytes = hex::decode(pubkey_hex).map_err(|e| { ContextProviderError::Generic(format!("Invalid hex in public key: {}", e)) })?; if pubkey_bytes.len() != 48 { return Err(ContextProviderError::Generic(format!( "Invalid public key length: {} bytes, expected 48", pubkey_bytes.len() ))); } pubkey_bytes.try_into().map_err(|_| { ContextProviderError::Generic("Failed to convert public key to array".to_string()) }) } }Then each call site becomes:
return Self::parse_quorum_public_key(&quorum.key);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/rs-sdk-trusted-context-provider/src/provider.rs` around lines 631 - 646, Extract the duplicated public-key parsing logic into a new helper method on TrustedHttpContextProvider (e.g., fn parse_quorum_public_key(key: &str) -> Result<[u8;48], ContextProviderError>) that performs trim_start_matches("0x"), hex::decode with ContextProviderError::Generic on decode errors, checks pubkey_bytes.len() == 48 and converts to [u8;48] returning the same ContextProviderError::Generic on conversion failure; then replace each duplicated block in get_quorum_public_key with a call to Self::parse_quorum_public_key(&quorum.key) to DRY up the code and preserve existing error behavior.
622-629: Consider using the existingFrom<AsyncError>conversion.The error from
block_onis manually mapped toContextProviderError::Generic, but there's already aFrom<dash_async::AsyncError> for ContextProviderErrorimpl inrs-context-provider/src/error.rsthat maps to the semanticAsyncErrorvariant. Using the?operator with theFromimpl would preserve the error categorization.♻️ Proposed refactor to use From impl
let this = self.clone(); let quorum = dash_async::block_on(async move { this.find_quorum(quorum_type, quorum_hash).await }) - .map_err(|e| ContextProviderError::Generic(format!("block_on failed: {}", e)))? + ? // Uses From<AsyncError> for ContextProviderError .map_err(|e| { debug!("Error finding quorum: {}", e); ContextProviderError::Generic(format!("Failed to find quorum: {}", e)) })?;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/rs-sdk-trusted-context-provider/src/provider.rs` around lines 622 - 629, The block_on error is being manually mapped to ContextProviderError::Generic, but there is a From<dash_async::AsyncError> -> ContextProviderError impl; replace the outer map_err mapping with the ? operator so the dash_async::block_on(...) error converts via From automatically. Specifically, change the block that calls dash_async::block_on(async move { this.find_quorum(quorum_type, quorum_hash).await }) .map_err(|e| ContextProviderError::Generic(...))? to simply use ? on the block_on result (i.e., dash_async::block_on(...)?), then keep the existing inner map_err that handles the find_quorum application error and debugging log; reference symbols: block_on, find_quorum, ContextProviderError, and dash_async::AsyncError.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@packages/rs-dash-async/src/block_on.rs`:
- Around line 87-97: In block_on's multi-thread branch (the match arm using
handle.spawn(worker(fut, tx))), replace the unbounded std::sync::mpsc::channel()
with a bounded std::sync::mpsc::sync_channel(1) to mirror the current-thread
path and make the single-message intent explicit; ensure you create tx/rx with
the same generic type as the other branch (e.g., sync_channel::<...>(1)) so the
spawned worker(fut, tx), rx.recv(), and the hdl handling remain type-compatible.
- Around line 209-216: The test function name
test_block_on_fails_on_current_thread_runtime is misleading because the body
asserts success; rename the test function to
test_block_on_succeeds_on_current_thread_runtime and update any references (doc
comment or CI/test filters) accordingly so the name matches the assertion and
comment describing the regression fix in the surrounding block_on test code.
In `@packages/rs-sdk-trusted-context-provider/src/provider.rs`:
- Around line 631-646: Extract the duplicated public-key parsing logic into a
new helper method on TrustedHttpContextProvider (e.g., fn
parse_quorum_public_key(key: &str) -> Result<[u8;48], ContextProviderError>)
that performs trim_start_matches("0x"), hex::decode with
ContextProviderError::Generic on decode errors, checks pubkey_bytes.len() == 48
and converts to [u8;48] returning the same ContextProviderError::Generic on
conversion failure; then replace each duplicated block in get_quorum_public_key
with a call to Self::parse_quorum_public_key(&quorum.key) to DRY up the code and
preserve existing error behavior.
- Around line 622-629: The block_on error is being manually mapped to
ContextProviderError::Generic, but there is a From<dash_async::AsyncError> ->
ContextProviderError impl; replace the outer map_err mapping with the ? operator
so the dash_async::block_on(...) error converts via From automatically.
Specifically, change the block that calls dash_async::block_on(async move {
this.find_quorum(quorum_type, quorum_hash).await }) .map_err(|e|
ContextProviderError::Generic(...))? to simply use ? on the block_on result
(i.e., dash_async::block_on(...)?), then keep the existing inner map_err that
handles the find_quorum application error and debugging log; reference symbols:
block_on, find_quorum, ContextProviderError, and dash_async::AsyncError.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 56379006-6e11-4e56-a33c-7609591618e8
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (10)
Cargo.tomlpackages/rs-context-provider/Cargo.tomlpackages/rs-context-provider/src/error.rspackages/rs-dash-async/Cargo.tomlpackages/rs-dash-async/src/block_on.rspackages/rs-dash-async/src/lib.rspackages/rs-sdk-trusted-context-provider/Cargo.tomlpackages/rs-sdk-trusted-context-provider/src/provider.rspackages/rs-sdk/Cargo.tomlpackages/rs-sdk/src/sync.rs
There was a problem hiding this comment.
Pull request overview
Fixes the deadlock caused by using futures::executor::block_on inside a Tokio runtime by introducing a runtime-aware async→sync bridging implementation (dash-async) and migrating the trusted context provider to use it.
Changes:
- Added new
dash-asynccrate providing a runtime-flavor-awareblock_on+AsyncError(including a WASM stub and regression tests for #3432). - Updated
TrustedHttpContextProviderto usedash_async::block_on(and made itCloneby switchingfallback_providertoArc). - Updated
dash-sdk::syncto re-exportblock_on/AsyncError, and movedFrom<AsyncError>conversion tors-context-provider.
Reviewed changes
Copilot reviewed 10 out of 11 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/rs-sdk/src/sync.rs | Re-exports dash-async bridge types; removes local implementation/tests. |
| packages/rs-sdk/Cargo.toml | Adds dash-async dependency. |
| packages/rs-sdk-trusted-context-provider/src/provider.rs | Replaces futures::executor::block_on with dash_async::block_on; makes provider clonable via Arc fallback. |
| packages/rs-sdk-trusted-context-provider/Cargo.toml | Adds dash-async, removes futures. |
| packages/rs-dash-async/src/lib.rs | New crate entry point exporting block_on + AsyncError. |
| packages/rs-dash-async/src/block_on.rs | Runtime-aware block_on implementation + tests (incl. #3432 regression). |
| packages/rs-dash-async/Cargo.toml | New crate manifest and dependencies. |
| packages/rs-context-provider/src/error.rs | Adds From<dash_async::AsyncError> to map to ContextProviderError::AsyncError. |
| packages/rs-context-provider/Cargo.toml | Adds dash-async dependency. |
| Cargo.toml | Adds rs-dash-async to workspace members. |
| Cargo.lock | Records new crate and dependency graph updates. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let recv_result = rx.recv()?; | ||
| join_handle | ||
| .join() | ||
| .map_err(|_| AsyncError::Generic("block_on worker thread panicked".to_string()))?; | ||
| recv_result |
There was a problem hiding this comment.
In the current-thread runtime branch, rx.recv()? can return RecvError if the worker thread panics before sending. Because of the early ?, the join_handle is never joined, and the panic is effectively swallowed (and the thread is left unjoined). Consider ensuring the worker thread is always joined (even when recv() fails) and prefer returning an error that reflects a worker panic when that happens.
| /// runtime flavor and spawns a dedicated OS thread with its own runtime when running | ||
| /// on a current-thread scheduler. | ||
| #[test] | ||
| fn test_block_on_fails_on_current_thread_runtime() { |
There was a problem hiding this comment.
The test name test_block_on_fails_on_current_thread_runtime contradicts the assertions and doc comment (it expects block_on to succeed on a current-thread runtime). Renaming the test to reflect success would make failures easier to interpret.
| fn test_block_on_fails_on_current_thread_runtime() { | |
| fn test_block_on_succeeds_on_current_thread_runtime() { |
| let this = self.clone(); | ||
| let quorum = | ||
| dash_async::block_on(async move { this.find_quorum(quorum_type, quorum_hash).await }) | ||
| .map_err(|e| ContextProviderError::Generic(format!("block_on failed: {}", e)))? |
There was a problem hiding this comment.
dash_async::block_on returns AsyncError, and ContextProviderError now has a dedicated AsyncError variant with From<dash_async::AsyncError> implemented. Instead of wrapping this as ContextProviderError::Generic("block_on failed: ..."), consider converting via ContextProviderError::from(e) so callers can distinguish async-bridging failures from generic provider errors.
| .map_err(|e| ContextProviderError::Generic(format!("block_on failed: {}", e)))? | |
| .map_err(ContextProviderError::from)? |
Review GateCommit:
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## v3.1-dev #3497 +/- ##
============================================
- Coverage 84.83% 84.82% -0.01%
============================================
Files 2476 2477 +1
Lines 267733 267790 +57
============================================
+ Hits 227123 227152 +29
- Misses 40610 40638 +28
🚀 New features to boost your workflow:
|
|
✅ DashSDKFFI.xcframework built for this PR.
SwiftPM (host the zip at a stable URL, then use): .binaryTarget(
name: "DashSDKFFI",
url: "https://your.cdn.example/DashSDKFFI.xcframework.zip",
checksum: "f5554700a84d2ca87ef13180d086aea4599551c2fab837319304bd4e7bba2ccc"
)Xcode manual integration:
|
Issue being fixed or feature implemented
Fixes #3432 —
futures::executor::block_oninside a tokio runtime inTrustedHttpContextProvidercauses a deadlock when quorum cache misses trigger async HTTP fetches.Root cause (from #3432)
ContextProvider::get_quorum_public_key()is synchronous, but on cache miss it callsfutures::executor::block_on(self.find_quorum(...))which makes HTTP requests viareqwest. Nestingfutures::executor::block_oninside a tokio runtime deadlocks on single-threaded runtimes or panics on multi-threaded runtimes. This affects the iOS SDK and any FFI consumer using the trusted context provider.What was done?
New crate:
packages/rs-dash-async(dash-async)Extracted a runtime-aware
block_onfunction that correctly handles all three scenarios:block_in_placepanic)block_in_place+ spawn for efficient bridgingAlso includes:
AsyncErrorerror typeErrwith JSPI reference instead ofunimplemented!()rs-sdk-trusted-context-provider— the fixfutures::executor::block_onwithdash_async::block_on— this is the direct fix for Deadlock: futures::executor::block_on inside tokio runtime in TrustedHttpContextProvider #3432TrustedHttpContextProviderClone(fallback_providerchanged fromBox<dyn ContextProvider>toArc<dyn ContextProvider>— only accessed via&self,ContextProvider: Send + Sync)#[cfg(target_arch = "wasm32")]separate code path —block_onhandles WASM internallydash-sdk::sync— backward compatibleblock_onandAsyncErrorfromdash-asyncuse dash_sdk::sync::block_onimports continue to work unchangedretryfunction and its tests remain indash-sdkrs-context-providerFrom<AsyncError> for ContextProviderErrorimpl here (orphan rule —AsyncErroris no longer local todash-sdk)How Has This Been Tested?
cargo test -p dash-async— 2 tests pass:test_block_on_nested_async_sync— multi-thread runtime with nested async→sync→async callstest_block_on_fails_on_current_thread_runtime— regression test for Deadlock: futures::executor::block_on inside tokio runtime in TrustedHttpContextProvider #3432: provesblock_onworks correctly on a current-thread runtime (the exact scenario that caused the deadlock)cargo test -p dash-sdk --lib -- sync::test— 40 retry tests passcargo test -p rs-sdk-trusted-context-provider— 8 tests passcargo checksucceeds for all affected cratesBreaking Changes
None. All public APIs are backward compatible via re-exports.
Checklist:
🤖 Co-authored by Claudius the Magnificent AI Agent
Summary by CodeRabbit
Release Notes
New Features
Refactor