Conversation
📝 WalkthroughWalkthroughThis PR introduces a GapFillWorker background service that periodically retries blocks recorded in a Changes
Sequence DiagramsequenceDiagram
participant GW as GapFillWorker
participant DB as Database<br/>(PgPool)
participant RPC as RPC Server
participant IDX as Indexer
participant MET as Metrics
participant BC as Broadcast<br/>Channel
loop run() continuous retry
GW->>DB: SELECT failed_blocks with<br/>backoff eligibility
alt Blocks found
GW->>RPC: Fetch block data<br/>(rate-limited)
alt Success
GW->>IDX: write_batch_and_clear_<br/>failed_block()
IDX->>DB: ensure_partitions_exist()
IDX->>DB: COPY data + DELETE<br/>from failed_blocks
GW->>DB: Count missing blocks
GW->>MET: Update<br/>missing_blocks gauge
GW->>BC: Broadcast recovery<br/>notification
else RPC/Write Failure
GW->>DB: Increment retry_count<br/>Update last_failed_at
end
else No Blocks
GW->>GW: Sleep (idle duration)
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
backend/crates/atlas-server/src/indexer/gap_fill_worker.rs (2)
103-111: Consider persistingcopy_clientacross batches for connection reuse.Currently,
connect_copy_clientis called on everyprocess_batch()invocation, establishing a new TCP connection each time. For a worker that runs infrequently (5-minute idle sleep), this overhead is acceptable, but persisting the connection in the struct would reduce latency and resource usage.This is a minor optimization that could be addressed in a follow-up if gap-fill volume increases.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/crates/atlas-server/src/indexer/gap_fill_worker.rs` around lines 103 - 111, Persist the copy client instead of creating it every process_batch call: add a field (e.g., copy_client: Option<CopyClient> or Arc<CopyClient>) to the worker struct used in gap_fill_worker.rs, initialize it lazily by calling Indexer::connect_copy_client(...) once (e.g., in the worker constructor or on first process_batch run) and reuse that instance in subsequent process_batch invocations, and handle reconnects by detecting connection errors from the existing copy_client and re-calling Indexer::connect_copy_client to replace it; reference the existing connect_copy_client function and the local variable copy_client in the diff to locate where to change.
177-180: Consider using a more efficient query for frequently accessed metric.
SELECT COUNT(*) FROM failed_blocksis O(n), but sincefailed_blocksis expected to be small (only unrecoverable blocks), this is acceptable. If the table grows large, consider usingpg_class.reltuplesas done elsewhere in the codebase.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/crates/atlas-server/src/indexer/gap_fill_worker.rs` around lines 177 - 180, The current code calls get_missing_block_count() after successful gap-fill which likely runs a COUNT(*) on failed_blocks (used by set_indexer_missing_blocks); replace or guard that expensive COUNT query with a cheaper estimate using pg_class.reltuples like elsewhere in the codebase when table size is large: update the get_missing_block_count() implementation (or add a new helper) to prefer querying pg_class.reltuples for relname = 'failed_blocks' and fall back to SELECT COUNT(*) only for small tables or when a precise value is required, and ensure the call sites (including where succeeded > 0 and set_indexer_missing_blocks is invoked) use the updated helper.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@backend/crates/atlas-server/src/indexer/gap_fill_worker.rs`:
- Around line 103-111: Persist the copy client instead of creating it every
process_batch call: add a field (e.g., copy_client: Option<CopyClient> or
Arc<CopyClient>) to the worker struct used in gap_fill_worker.rs, initialize it
lazily by calling Indexer::connect_copy_client(...) once (e.g., in the worker
constructor or on first process_batch run) and reuse that instance in subsequent
process_batch invocations, and handle reconnects by detecting connection errors
from the existing copy_client and re-calling Indexer::connect_copy_client to
replace it; reference the existing connect_copy_client function and the local
variable copy_client in the diff to locate where to change.
- Around line 177-180: The current code calls get_missing_block_count() after
successful gap-fill which likely runs a COUNT(*) on failed_blocks (used by
set_indexer_missing_blocks); replace or guard that expensive COUNT query with a
cheaper estimate using pg_class.reltuples like elsewhere in the codebase when
table size is large: update the get_missing_block_count() implementation (or add
a new helper) to prefer querying pg_class.reltuples for relname =
'failed_blocks' and fall back to SELECT COUNT(*) only for small tables or when a
precise value is required, and ensure the call sites (including where succeeded
> 0 and set_indexer_missing_blocks is invoked) use the updated helper.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e20ebd0e-64ce-40b7-a000-9cd68f12f743
📒 Files selected for processing (8)
backend/Cargo.tomlbackend/crates/atlas-server/Cargo.tomlbackend/crates/atlas-server/src/indexer/gap_fill_worker.rsbackend/crates/atlas-server/src/indexer/indexer.rsbackend/crates/atlas-server/src/indexer/mod.rsbackend/crates/atlas-server/src/main.rsbackend/crates/atlas-server/tests/integration/gap_fill.rsbackend/crates/atlas-server/tests/integration/main.rs
Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Summary
failed_blockswith backofffailed_blocksrows atomically and refresh the missing-blocks metricSummary by CodeRabbit
Closes #57