fix: eliminate dual prediction state, wire webhooks from single source of truth#2780
Merged
tempusfrangit merged 3 commits intomainfrom Feb 27, 2026
Merged
fix: eliminate dual prediction state, wire webhooks from single source of truth#2780tempusfrangit merged 3 commits intomainfrom
tempusfrangit merged 3 commits intomainfrom
Conversation
Member
Author
|
Working on ensuring we haven't dropped a webhook type. |
909cba4 to
7fc4b45
Compare
…e of truth Collapse PredictionSupervisor into PredictionService to eliminate the dual state tracking that caused terminal webhooks to have empty logs and intermediate webhooks to never fire. Before: PredictionState (supervisor) was a stale copy that never received logs or outputs. Webhooks fired from supervisor had empty payloads. After: Prediction (orchestrator) is the single source of truth. Webhook methods fire directly from Prediction mutation methods (set_processing, set_succeeded, append_log, append_output, etc.) where real-time state lives. Also applies block_in_place fix for log forwarder starvation — tells tokio the predict thread will block on GIL, allowing log_forwarder tasks to be scheduled on other threads instead of batching at end. Net: -176 lines, supervisor.rs deleted entirely.
…s from throttling - get_prediction_response() now falls through to streaming outputs vec (same logic as webhook payload builder) so GET /predictions/:id shows intermediate output during prediction - Output webhook events bypass the 500ms throttle: they are high-value, infrequent, and were effectively unthrottled in the old Python runtime where file uploads were synchronous
5a5cdef to
7e7b189
Compare
7c6ac58 to
1b1860f
Compare
Terminal state guard: set_succeeded/set_failed/set_canceled now return early if prediction is already terminal. Prevents PredictionSlot::drop from overwriting a Succeeded status with Failed when the slot drops after the orchestrator already completed the prediction. Unified snapshot: extract build_state_snapshot() on Prediction as the single source of truth for prediction JSON. Both webhook payloads and GET responses now use the same method (GET adds input on top). Removes duplicated metrics-building logic between prediction.rs and service.rs.
michaeldwan
approved these changes
Feb 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PredictionSupervisoris deleted entirely —PredictionServiceis now the single owner of prediction state.Predictionmutation methods (set_processing,set_succeeded,append_log,append_output, etc.) where real-time state lives. No more stale copies.block_in_placefix for log forwarder starvation — tells tokio the predict thread will block on GIL, allowinglog_forwardertasks to be scheduled on other threads instead of batching all logs at prediction end.What changed
supervisor.rsservice.rsprediction.rsroutes.rsupdate_status()calls, collapsed submit+create into one stepworker.rsblock_in_placefix for log deliverylib.rsbridge/protocol.rsmanual_str_repeatwarning in testNet: -169 lines (12 files, 523 insertions, 692 deletions). HTTP API surface unchanged.
Before / After
Before: Two parallel state systems —
Prediction(orchestrator, real-time) vsPredictionState(supervisor, stale copy that never got logs/outputs). Routes had to manually sync state back via 6update_status()calls. Terminal webhooks sent from supervisor had empty logs.After:
Predictionis the single source of truth.PredictionServiceholdsArc<Mutex<Prediction>>in a DashMap. Webhooks fire as side effects of state mutations — no sync step needed.closes #2778