fix: reward pipeline skips abandoned episodes (3 related fixes)#1784
Open
chiefmojo wants to merge 4 commits into
Open
fix: reward pipeline skips abandoned episodes (3 related fixes)#1784chiefmojo wants to merge 4 commits into
chiefmojo wants to merge 4 commits into
Conversation
- episodeRewardIsDirty() now includes closeReason=abandoned (219 of 224 closed episodes were silently skipped) - Added 10-min setInterval for autoRescoreDirtyClosedEpisodes() so the daemon bridge doesn"t go permanently idle after bootstrap
ensure_viewer_daemon() probes port 18800 with a 15-second timeout. When core.init() rescores dirty episodes it can take minutes, keeping the port unbound past the deadline. ensure_viewer_daemon() then gives up, releases the startup lock, and the next keepalive cycle spawns a replacement daemon that kills the in-progress one — creating a restart loop that interrupts scoring mid-batch. Fix: in daemon mode, bind the HTTP server first so the health probe succeeds within seconds, then run core.init() asynchronously in the background. Non-daemon (stdio/JSON-RPC) mode is unchanged — it still runs init synchronously so host-LLM fallback is available during recovery. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ardIsDirty() reward.skipped=true was being set on abandoned episodes when the reward runner decided the conversation was too short (< 2 turns). This flag then permanently excluded them from recovery rescoring — episodeRewardIsDirty() returned false before the closeReason==="abandoned" check was reached, leaving 173 episodes stuck unscored indefinitely. Fix: only honor reward.skipped for episodes that are NOT (abandoned + no prior recovery attempt). recoverDirtyClosedEpisodes() patches closeReason → "finalized" after processing, so if reward still skips on recovery the next dirty check sees closeReason !== "abandoned" and stops retrying — exactly one recovery pass, no infinite loop. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…lagged episodes Episodes tagged lightweightMemory:true during a prior session when lightweight mode was active were permanently excluded from scoring even after the config was changed to enabled:false. Three guards enforced this unconditionally: the pre-filter in autoRescoreDirtyClosedEpisodes and init(), the skip inside recoverDirtyClosedEpisodes, and the check in the capture subscriber. All three now condition on the *current* handle.algorithm.lightweightMemory.enabled value. The snapshot emitted for a legacy-flagged episode also has the lightweightMemory field stripped before the event fires, so the capture subscriber receives a clean snapshot. When lightweight mode is on, behavior is unchanged. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The reward scoring pipeline processes only ~7 episodes on bootstrap and then goes permanently idle, leaving the vast majority of traces unscored. On a 43 MB database with 3,600 traces, only 45 (1.3%) had r_human scores before fixing.
Root Cause & Fixes
Three interacting bugs, all in
apps/memos-local-plugin/:Fix 1:
episodeRewardIsDirty()excluded abandoned episodesThe dirty-check condition only matched
closeReason === "finalized"orrecoveryReason === "missed_session_end". 219 of 224 closed episodes hadcloseReason: "abandoned".Fix: Add
closeReason === "abandoned"to the rescore condition + a 10-minute periodic rescore timer.Fix 2: Daemon HTTP server must bind before
core.init()The daemon bridge started
core.init()beforestartHttpServer(). When init rescores dirty episodes (5+ minutes of LLM calls), port 18800 stays free. The Python watchdog times out its 15-second health probe and spawns a new daemon — killing the in-progress one.Fix: In daemon mode, bind HTTP server first, then run init asynchronously.
Fix 3:
reward.skippedblocked abandoned episodes from recoveryepisodeRewardIsDirty()checkedreward.skipped === trueBEFOREcloseReason === "abandoned". The reward runner correctly skipped 173 one-turn episodes in a previous session, but that flag permanently excluded them from recovery.Fix:
reward.skippedis only honored if the episode is NOT (abandoned + no prior recovery). Abandoned episodes get one pass; after that, closeReason is patched to "finalized" and the normal skip guard works.Verification