Summary
The agency CLI / Copilot CLI does not reliably unlink its inuse.<pid>.lock file on every exit path. When the process is sent SIGKILL, crashes, or exits via certain unclean paths, stale lock files accumulate in ~/.copilot/session-state/<uuid>/. Future attempts to resume the affected sessions then incorrectly believe the session is still in use.
Symptom (downstream)
We hit this in copilot-ide, an Electron wrapper around agency copilot --acp / agency copilot. After any unclean exit, opening an old session shows a "Force resume?" prompt because the IDE sees a leftover inuse.<pid>.lock and (correctly) refuses to overwrite a session that might still be open elsewhere.
User-visible bug we filed downstream: shachaf-ashkenazi/copilot-ide#18.
Reproduction is trivial: start a session (agency copilot &), then SIGKILL the resulting PID (signal 9). The inuse.<pid>.lock file in the session dir survives the dead process indefinitely.
Current downstream workaround
We do PID-liveness probing in the IDE before trusting any lock file:
src/main/sessionLocks.ts.
It parses the PID out of the filename, calls process.kill(pid, 0) (POSIX no-op probe) to check if it's alive, then runs ps -p <pid> -o command= to verify the live PID actually looks like the Copilot CLI (kernel reuses PIDs aggressively — a multi-day-old lock will frequently point at an unrelated login shell or node process). Locks owned by dead OR live-but-unrelated PIDs are unlinked.
Empirically: ~5–10 ms per lock probed, only paid when the PID is alive. Catches the bug but is a workaround for behavior the CLI should own.
Proposed fix (upstream)
Any one of these would close the gap:
- Trap
SIGTERM / SIGINT and unlink the lock before exit. Covers the common "user closes terminal" / IDE-shutdown case. Doesn't help SIGKILL or hard crashes.
- Use OS-reclaimed advisory locks (
flock(2) / fcntl(F_SETLK)) instead of (or in addition to) the on-disk marker file. The kernel releases these when the holding process dies, regardless of how. This is the most robust option.
- Bake the PID-liveness reclaim logic into the CLI itself, the same approach our
sessionLocks.ts takes. Cheap, no semantic change, defends against pre-existing stale locks created by older versions.
(1) + (3) is probably the easiest path that handles all exit modes. (2) is the cleanest if you're willing to change the lock format.
Notes
- We'll keep our PID-liveness reclaim in the IDE either way as defense in depth. Removing it would only be safe once we can assume all CLI versions in the wild handle this correctly, which won't be true for years.
- This affects both layouts of the binary that write into
~/.copilot/session-state/: ~/.copilot-cli/<v>/copilot (the agency-bundled inner binary) and /opt/homebrew/Caskroom/copilot-cli/<v>/copilot (the standalone Copilot CLI brew Cask). Both need the fix.
Environment
- agency 2026.5.11.1 (wrapping Copilot CLI 1.0.45)
- macOS (aarch64)
Related
- Downstream user-visible bug: shachaf-ashkenazi/copilot-ide#18
- Downstream tracker: shachaf-ashkenazi/copilot-ide#37
Summary
The agency CLI / Copilot CLI does not reliably unlink its
inuse.<pid>.lockfile on every exit path. When the process is sent SIGKILL, crashes, or exits via certain unclean paths, stale lock files accumulate in~/.copilot/session-state/<uuid>/. Future attempts to resume the affected sessions then incorrectly believe the session is still in use.Symptom (downstream)
We hit this in copilot-ide, an Electron wrapper around
agency copilot --acp/agency copilot. After any unclean exit, opening an old session shows a "Force resume?" prompt because the IDE sees a leftoverinuse.<pid>.lockand (correctly) refuses to overwrite a session that might still be open elsewhere.User-visible bug we filed downstream: shachaf-ashkenazi/copilot-ide#18.
Reproduction is trivial: start a session (
agency copilot &), then SIGKILL the resulting PID (signal 9). Theinuse.<pid>.lockfile in the session dir survives the dead process indefinitely.Current downstream workaround
We do PID-liveness probing in the IDE before trusting any lock file:
src/main/sessionLocks.ts.It parses the PID out of the filename, calls
process.kill(pid, 0)(POSIX no-op probe) to check if it's alive, then runsps -p <pid> -o command=to verify the live PID actually looks like the Copilot CLI (kernel reuses PIDs aggressively — a multi-day-old lock will frequently point at an unrelatedloginshell ornodeprocess). Locks owned by dead OR live-but-unrelated PIDs are unlinked.Empirically: ~5–10 ms per lock probed, only paid when the PID is alive. Catches the bug but is a workaround for behavior the CLI should own.
Proposed fix (upstream)
Any one of these would close the gap:
SIGTERM/SIGINTand unlink the lock before exit. Covers the common "user closes terminal" / IDE-shutdown case. Doesn't help SIGKILL or hard crashes.flock(2)/fcntl(F_SETLK)) instead of (or in addition to) the on-disk marker file. The kernel releases these when the holding process dies, regardless of how. This is the most robust option.sessionLocks.tstakes. Cheap, no semantic change, defends against pre-existing stale locks created by older versions.(1) + (3) is probably the easiest path that handles all exit modes. (2) is the cleanest if you're willing to change the lock format.
Notes
~/.copilot/session-state/:~/.copilot-cli/<v>/copilot(the agency-bundled inner binary) and/opt/homebrew/Caskroom/copilot-cli/<v>/copilot(the standalone Copilot CLI brew Cask). Both need the fix.Environment
Related