Skip to content

Stale inuse.<pid>.lock files left behind on unclean exit (SIGKILL / crashes) #3255

@shachaf-ashkenazi

Description

@shachaf-ashkenazi

Summary

The agency CLI / Copilot CLI does not reliably unlink its inuse.<pid>.lock file on every exit path. When the process is sent SIGKILL, crashes, or exits via certain unclean paths, stale lock files accumulate in ~/.copilot/session-state/<uuid>/. Future attempts to resume the affected sessions then incorrectly believe the session is still in use.

Symptom (downstream)

We hit this in copilot-ide, an Electron wrapper around agency copilot --acp / agency copilot. After any unclean exit, opening an old session shows a "Force resume?" prompt because the IDE sees a leftover inuse.<pid>.lock and (correctly) refuses to overwrite a session that might still be open elsewhere.

User-visible bug we filed downstream: shachaf-ashkenazi/copilot-ide#18.

Reproduction is trivial: start a session (agency copilot &), then SIGKILL the resulting PID (signal 9). The inuse.<pid>.lock file in the session dir survives the dead process indefinitely.

Current downstream workaround

We do PID-liveness probing in the IDE before trusting any lock file:
src/main/sessionLocks.ts.

It parses the PID out of the filename, calls process.kill(pid, 0) (POSIX no-op probe) to check if it's alive, then runs ps -p <pid> -o command= to verify the live PID actually looks like the Copilot CLI (kernel reuses PIDs aggressively — a multi-day-old lock will frequently point at an unrelated login shell or node process). Locks owned by dead OR live-but-unrelated PIDs are unlinked.

Empirically: ~5–10 ms per lock probed, only paid when the PID is alive. Catches the bug but is a workaround for behavior the CLI should own.

Proposed fix (upstream)

Any one of these would close the gap:

  1. Trap SIGTERM / SIGINT and unlink the lock before exit. Covers the common "user closes terminal" / IDE-shutdown case. Doesn't help SIGKILL or hard crashes.
  2. Use OS-reclaimed advisory locks (flock(2) / fcntl(F_SETLK)) instead of (or in addition to) the on-disk marker file. The kernel releases these when the holding process dies, regardless of how. This is the most robust option.
  3. Bake the PID-liveness reclaim logic into the CLI itself, the same approach our sessionLocks.ts takes. Cheap, no semantic change, defends against pre-existing stale locks created by older versions.

(1) + (3) is probably the easiest path that handles all exit modes. (2) is the cleanest if you're willing to change the lock format.

Notes

  • We'll keep our PID-liveness reclaim in the IDE either way as defense in depth. Removing it would only be safe once we can assume all CLI versions in the wild handle this correctly, which won't be true for years.
  • This affects both layouts of the binary that write into ~/.copilot/session-state/: ~/.copilot-cli/<v>/copilot (the agency-bundled inner binary) and /opt/homebrew/Caskroom/copilot-cli/<v>/copilot (the standalone Copilot CLI brew Cask). Both need the fix.

Environment

  • agency 2026.5.11.1 (wrapping Copilot CLI 1.0.45)
  • macOS (aarch64)

Related

  • Downstream user-visible bug: shachaf-ashkenazi/copilot-ide#18
  • Downstream tracker: shachaf-ashkenazi/copilot-ide#37

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:sessionsSession management, resume, history, session picker, and session state

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions