Skip to content

fix: reward pipeline skips abandoned episodes (3 related fixes)#1784

Open
chiefmojo wants to merge 4 commits into
MemTensor:mainfrom
chiefmojo:fix/abandoned-episode-scoring
Open

fix: reward pipeline skips abandoned episodes (3 related fixes)#1784
chiefmojo wants to merge 4 commits into
MemTensor:mainfrom
chiefmojo:fix/abandoned-episode-scoring

Conversation

@chiefmojo
Copy link
Copy Markdown

Problem

The reward scoring pipeline processes only ~7 episodes on bootstrap and then goes permanently idle, leaving the vast majority of traces unscored. On a 43 MB database with 3,600 traces, only 45 (1.3%) had r_human scores before fixing.

Root Cause & Fixes

Three interacting bugs, all in apps/memos-local-plugin/:

Fix 1: episodeRewardIsDirty() excluded abandoned episodes

The dirty-check condition only matched closeReason === "finalized" or recoveryReason === "missed_session_end". 219 of 224 closed episodes had closeReason: "abandoned".

Fix: Add closeReason === "abandoned" to the rescore condition + a 10-minute periodic rescore timer.

Fix 2: Daemon HTTP server must bind before core.init()

The daemon bridge started core.init() before startHttpServer(). When init rescores dirty episodes (5+ minutes of LLM calls), port 18800 stays free. The Python watchdog times out its 15-second health probe and spawns a new daemon — killing the in-progress one.

Fix: In daemon mode, bind HTTP server first, then run init asynchronously.

Fix 3: reward.skipped blocked abandoned episodes from recovery

episodeRewardIsDirty() checked reward.skipped === true BEFORE closeReason === "abandoned". The reward runner correctly skipped 173 one-turn episodes in a previous session, but that flag permanently excluded them from recovery.

Fix: reward.skipped is only honored if the episode is NOT (abandoned + no prior recovery). Abandoned episodes get one pass; after that, closeReason is patched to "finalized" and the normal skip guard works.

Verification

doranvas and others added 4 commits May 22, 2026 05:22
- episodeRewardIsDirty() now includes closeReason=abandoned (219 of 224
  closed episodes were silently skipped)
- Added 10-min setInterval for autoRescoreDirtyClosedEpisodes() so the
  daemon bridge doesn"t go permanently idle after bootstrap
ensure_viewer_daemon() probes port 18800 with a 15-second timeout. When
core.init() rescores dirty episodes it can take minutes, keeping the port
unbound past the deadline. ensure_viewer_daemon() then gives up, releases
the startup lock, and the next keepalive cycle spawns a replacement daemon
that kills the in-progress one — creating a restart loop that interrupts
scoring mid-batch.

Fix: in daemon mode, bind the HTTP server first so the health probe
succeeds within seconds, then run core.init() asynchronously in the
background. Non-daemon (stdio/JSON-RPC) mode is unchanged — it still
runs init synchronously so host-LLM fallback is available during recovery.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ardIsDirty()

reward.skipped=true was being set on abandoned episodes when the reward
runner decided the conversation was too short (< 2 turns). This flag
then permanently excluded them from recovery rescoring — episodeRewardIsDirty()
returned false before the closeReason==="abandoned" check was reached,
leaving 173 episodes stuck unscored indefinitely.

Fix: only honor reward.skipped for episodes that are NOT (abandoned +
no prior recovery attempt). recoverDirtyClosedEpisodes() patches
closeReason → "finalized" after processing, so if reward still skips on
recovery the next dirty check sees closeReason !== "abandoned" and stops
retrying — exactly one recovery pass, no infinite loop.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…lagged episodes

Episodes tagged lightweightMemory:true during a prior session when
lightweight mode was active were permanently excluded from scoring even
after the config was changed to enabled:false. Three guards enforced
this unconditionally: the pre-filter in autoRescoreDirtyClosedEpisodes
and init(), the skip inside recoverDirtyClosedEpisodes, and the check
in the capture subscriber. All three now condition on the *current*
handle.algorithm.lightweightMemory.enabled value. The snapshot emitted
for a legacy-flagged episode also has the lightweightMemory field
stripped before the event fires, so the capture subscriber receives a
clean snapshot. When lightweight mode is on, behavior is unchanged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reward pipeline skips abandoned episodes — 98% of closed episodes never scored

2 participants