Skip to content

Recovery sequencer race: block production starts before P2P catchup completes, causing valid double-sign detection #3330

@CaelRowley

Description

@CaelRowley

Summary

There is a race condition in sequencer recovery where a recovery sequencer can begin producing blocks before P2P catchup has completed. If the original blocks later arrive via DA retrieval or P2P sync, the node correctly detects a double-sign/equivocation because the same signing key produced different headers at the same height.

The issue is not with double-sign detection itself. The new detection logic is correctly exposing an existing flaw in the recovery flow.

Observed Behavior

This was discovered while implementing double-sign detection in: #3310

TestSequencerRecoveryFromP2P began failing because the recovery sequencer can produce conflicting blocks before synchronization completes. The test already acknowledged this possibility:

recovery node produced its own blocks (P2P sync was not completed in time)

With the new double-sign detection logic, the node now correctly detects the equivocation and halts via sendCriticalError.

Before PR #3310, the conflicting headers were silently ignored due to pendingHeaders first-write-wins behavior.

Root Cause Analysis

Recovery synchronization is not a strict prerequisite for block production, which can begin before catchup has fully completed. If catchup times out, the sequencer may still start aggregation using incomplete state, allowing it to produce blocks at heights that already contain signed blocks from the original sequencer session.

Failure flow:

  1. Original sequencer produces N blocks using the genesis sequencer key and submits some to DA.
  2. Original sequencer stops.
  3. Fullnode has all N blocks via P2P gossip.
  4. Recovery sequencer starts with:
    • a fresh store
    • the same signing key
    • P2P catchup enabled
  5. Catchup does not complete before timeout.
  6. Recovery sequencer proceeds with partial or empty local state and starts producing blocks.
  7. Original blocks later arrive through DA retrieval/P2P sync.
  8. Syncer.detectDoubleSign detects:
    • same signer
    • same height
    • different header hash
  9. Evidence is recorded and the node halts via sendCriticalError.

Impact

A recovering sequencer can unintentionally equivocate against its own previously produced chain during startup. This results in:

  • valid double-sign evidence being generated
  • sequencer halt during recovery
  • unsafe recovery behavior after restart/outage

Expected Behavior

A recovery sequencer should never begin block production until chain continuity has been fully established.

If synchronization cannot complete safely, recovery should fail rather than continue with partial state.

Suggested Fixes

  • Gate block production until successful catchup completion.
  • Verify chain continuity before producing blocks at height H.
  • Consider making recovery sync timeout fatal instead of continuing with partial state.

Current Workaround

TestSequencerRecoveryFromP2P has been temporarily skipped in PR #3310 pending a proper fix for the recovery race.

The test should be re-enabled once recovery guarantees synchronization safety before block production begins.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions