Skip to content

fix: eliminate dual prediction state, wire webhooks from single source of truth#2780

Merged
tempusfrangit merged 3 commits intomainfrom
fix/webhook-state-unification
Feb 27, 2026
Merged

fix: eliminate dual prediction state, wire webhooks from single source of truth#2780
tempusfrangit merged 3 commits intomainfrom
fix/webhook-state-unification

Conversation

@tempusfrangit
Copy link
Member

@tempusfrangit tempusfrangit commented Feb 26, 2026

Summary

  • Eliminates dual state tracking that caused terminal webhooks to have empty logs and intermediate webhooks to never fire. PredictionSupervisor is deleted entirely — PredictionService is now the single owner of prediction state.
  • Webhooks fire from Prediction mutation methods (set_processing, set_succeeded, append_log, append_output, etc.) where real-time state lives. No more stale copies.
  • Applies block_in_place fix for log forwarder starvation — tells tokio the predict thread will block on GIL, allowing log_forwarder tasks to be scheduled on other threads instead of batching all logs at prediction end.

What changed

File Change
supervisor.rs Deleted (432 lines)
service.rs Absorbed DashMap, cancel, PredictionHandle, SyncPredictionGuard
prediction.rs Added webhook firing in mutation methods
routes.rs Removed all update_status() calls, collapsed submit+create into one step
worker.rs block_in_place fix for log delivery
lib.rs Removed supervisor re-exports
bridge/protocol.rs Fix clippy manual_str_repeat warning in test
5 doc/architecture files Updated to reflect new architecture

Net: -169 lines (12 files, 523 insertions, 692 deletions). HTTP API surface unchanged.

Before / After

Before: Two parallel state systems — Prediction (orchestrator, real-time) vs PredictionState (supervisor, stale copy that never got logs/outputs). Routes had to manually sync state back via 6 update_status() calls. Terminal webhooks sent from supervisor had empty logs.

After: Prediction is the single source of truth. PredictionService holds Arc<Mutex<Prediction>> in a DashMap. Webhooks fire as side effects of state mutations — no sync step needed.

closes #2778

@tempusfrangit tempusfrangit requested a review from a team as a code owner February 26, 2026 20:19
@tempusfrangit tempusfrangit added this to the 0.17.0 Release milestone Feb 26, 2026
@tempusfrangit
Copy link
Member Author

Working on ensuring we haven't dropped a webhook type.

@tempusfrangit tempusfrangit force-pushed the fix/webhook-state-unification branch from 909cba4 to 7fc4b45 Compare February 26, 2026 21:18
@tempusfrangit tempusfrangit changed the base branch from main to fix/list-path-upload February 26, 2026 21:25
…e of truth

Collapse PredictionSupervisor into PredictionService to eliminate the
dual state tracking that caused terminal webhooks to have empty logs
and intermediate webhooks to never fire.

Before: PredictionState (supervisor) was a stale copy that never received
logs or outputs. Webhooks fired from supervisor had empty payloads.

After: Prediction (orchestrator) is the single source of truth.
Webhook methods fire directly from Prediction mutation methods
(set_processing, set_succeeded, append_log, append_output, etc.)
where real-time state lives.

Also applies block_in_place fix for log forwarder starvation — tells
tokio the predict thread will block on GIL, allowing log_forwarder
tasks to be scheduled on other threads instead of batching at end.

Net: -176 lines, supervisor.rs deleted entirely.
…s from throttling

- get_prediction_response() now falls through to streaming outputs vec
  (same logic as webhook payload builder) so GET /predictions/:id shows
  intermediate output during prediction
- Output webhook events bypass the 500ms throttle: they are high-value,
  infrequent, and were effectively unthrottled in the old Python runtime
  where file uploads were synchronous
@tempusfrangit tempusfrangit force-pushed the fix/webhook-state-unification branch from 7c6ac58 to 1b1860f Compare February 27, 2026 22:22
@mfainberg-cf mfainberg-cf changed the base branch from fix/list-path-upload to main February 27, 2026 22:22
@tempusfrangit tempusfrangit enabled auto-merge (squash) February 27, 2026 22:37
Terminal state guard: set_succeeded/set_failed/set_canceled now return
early if prediction is already terminal. Prevents PredictionSlot::drop
from overwriting a Succeeded status with Failed when the slot drops
after the orchestrator already completed the prediction.

Unified snapshot: extract build_state_snapshot() on Prediction as the
single source of truth for prediction JSON. Both webhook payloads and
GET responses now use the same method (GET adds input on top). Removes
duplicated metrics-building logic between prediction.rs and service.rs.
@tempusfrangit tempusfrangit merged commit 838c013 into main Feb 27, 2026
37 checks passed
@tempusfrangit tempusfrangit deleted the fix/webhook-state-unification branch February 27, 2026 23:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Logs are buffered weirdly in coglet + not landing in upstream prediction outputs

2 participants