Skip to content

op-conductor,op-node: SSZ-binary commit-unsafe-payload endpoint#37

Open
wlawt wants to merge 4 commits into
developfrom
conductor-binary-endpoint
Open

op-conductor,op-node: SSZ-binary commit-unsafe-payload endpoint#37
wlawt wants to merge 4 commits into
developfrom
conductor-binary-endpoint

Conversation

@wlawt
Copy link
Copy Markdown

@wlawt wlawt commented May 25, 2026

cherry-picked from #36 to only include the changes from that PR and not the targetted branch (ie., #35)

BrianBland and others added 2 commits May 25, 2026 12:00
Adds a non-JSON-RPC HTTP endpoint to op-conductor for committing unsafe
payloads, and an opt-in op-node client switch to use it.

  POST /commit-unsafe-payload
  Content-Type: application/octet-stream
  Body: SSZ-encoded ExecutionPayloadEnvelope (raw bytes, no length prefix)

Response is 200 OK on success, 4xx/5xx with plain-text body otherwise. The
body cap is 16 MiB by default, comfortably fitting future 10 MB SSZ blocks.

The existing JSON-RPC commit-unsafe-payload encodes the entire payload as
JSON, which is dominated by encoding/json's per-byte tokenizer. On the
sequencer the leader-side RPC handler currently spends ~10ms (mainnet
typical) to ~27ms (peak) on JSON decode alone, before raft replication
even starts. Calibration against the op-conductor commit-unsafe-payload
latency notebook (DD #13980177) confirms ~109 MB/s JSON decode throughput
matches loopback bench predictions.

Loopback bench (M4 Max, 2 MB SSZ payload):

  JSON-RPC end-to-end:    38.4 ms (109 MB/s)
  Binary SSZ end-to-end:   1.4 ms (1.5 GB/s)  ~27x faster

Production extrapolation (linear scaling):

  Mainnet typical (~500 KB SSZ): 10.1 ms -> ~1.7 ms (~6x)
  Mainnet peak (~1.3 MB SSZ):    27.1 ms -> ~2.5 ms (~11x)
  Future 10 MB SSZ blocks:       blocked (5 MB JSON-RPC body cap) -> ~7 ms

The remaining ~1.4 ms residual on the binary path is the HTTP machinery
floor (verified by a discard-only handler at 1.37 ms).

Off by default. Opt in with --conductor.binary-commit (env
OP_NODE_CONDUCTOR_BINARY_COMMIT=true). The op-node side keeps the
existing JSON-RPC client and adds a parallel BinaryCommitClient that
points at the same conductor URL (the binary endpoint shares the HTTP
server with the JSON-RPC route). When the flag is on, CommitUnsafePayload
uses the binary client; everything else (Leader, OverrideLeader, etc.)
stays on JSON-RPC.

External CL clients (e.g. base's Rust replacement for op-node) can
target the same endpoint with any HTTP client + SSZ encoder.

- [ ] Devnet: run a 3-node conductor cluster with op-node using the new flag
- [ ] Compare commit_unsafe_payload latency before/after on devnet
- [ ] Roll the conductor side first (binary endpoint is additive; JSON-RPC
      still served), then op-node opt-in per-environment

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Adds op_conductor_binary_commit_request_duration_seconds{success} to
measure the full HTTP handler round-trip for POST /commit-unsafe-payload.

Without this, the binary endpoint had no timing equivalent to
op_conductor_rpc_server_request_duration_seconds{method=conductor_commitunsafepayload}
on the JSON-RPC path. The new histogram uses the same buckets, so
JSON-RPC and binary latency can be directly overlaid in dashboards
(e.g., the op-conductor commit-unsafe-payload latency notebook).

The success label distinguishes 5xx raft errors from 200s.
The timer starts before body-read and stops after the response status
is written, capturing: body read + raft.Apply + response overhead.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@cb-heimdall
Copy link
Copy Markdown
Collaborator

cb-heimdall commented May 25, 2026

🟡 Heimdall Review Status

Requirement Status More Info
Reviews 🟡 0/1
Denominator calculation
Show calculation
1 if user is bot 0
1 if user is external 0
2 if repo is sensitive 0
From .codeflow.yml 1
Additional review requirements
Show calculation
Max 0
0
From CODEOWNERS 0
Global minimum 0
Max 1
1
1 if commit is unverified 0
Sum 1


f := rc.r.Apply(ssz, defaultTimeout)
if err := f.Error(); err != nil {
return errors.Wrap(err, "failed to apply payload envelope")
Copy link
Copy Markdown

@niran niran May 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From Codex:

This only checks raft-level apply errors. The SSZ validation happens inside unsafeHeadTracker.Apply, but hashicorp/raft returns FSM Apply errors via ApplyFuture.Response(), not ApplyFuture.Error(). As written, malformed SSZ can be appended to the raft log, fail FSM decoding, and still return success from the binary endpoint. That is a behavior change from the JSON-RPC path, which decodes into an ExecutionPayloadEnvelope and successfully MarshalSSZs before raft.Apply.

We should either validate the SSZ before applying:

env := new(eth.ExecutionPayloadEnvelope)
if err := env.UnmarshalSSZ(eth.BlockV4, uint32(len(ssz)), bytes.NewReader(ssz)); err != nil {
    if err := env.UnmarshalSSZ(eth.BlockV3, uint32(len(ssz)), bytes.NewReader(ssz)); err != nil {
        return fmt.Errorf("invalid ssz payload: %w", err)
    }
}

Or check f.Response() for an error after f.Error() returns nil. For this FSM, success returns nil; decode/apply failures return the error object from unsafeHeadTracker.Apply.

f := rc.r.Apply(ssz, defaultTimeout)
if err := f.Error(); err != nil {
    return errors.Wrap(err, "failed to apply payload envelope")
}
if resp := f.Response(); resp != nil {
    if err, ok := resp.(error); ok {
        return errors.Wrap(err, "failed to apply payload envelope to FSM")
    }
    return errors.Errorf("unexpected raft apply response: %T: %v", resp, resp)
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants