Skip to content

refactor(kiloclaw): extract inline CLI run logic from user and org routers into shared helpers#2430

Open
evanjacobson wants to merge 10 commits intomainfrom
feature/kiloclaw-admin-cli_3-router-refactor
Open

refactor(kiloclaw): extract inline CLI run logic from user and org routers into shared helpers#2430
evanjacobson wants to merge 10 commits intomainfrom
feature/kiloclaw-admin-cli_3-router-refactor

Conversation

@evanjacobson
Copy link
Copy Markdown
Contributor

Summary

The user and org routers duplicated CLI run persistence logic — inline DB inserts, manual status polling with persistence, and raw cancel calls. This extracts that into the shared cli-runs.ts helpers from Phase 2, making both routers thinner and ensuring they benefit from the cross-run timestamp guard and instance-scoped cancel fixes.

The org router also gains 409 conflict and controller_route_unavailable error handling that was previously missing.

Verification

  • pnpm --filter web typecheck
  • pnpm format
  • git diff --cached --check

Visual Changes

N/A

Reviewer Notes

This PR targets the Phase 2 branch (feature/kiloclaw-admin-cli_1-schema-cli-runs) as its base. Once Phase 2 merges, rebase onto main.

…completed, failed, cancelled) for the same run, isControllerStatusForRun returns true (it only checks hasRun + startedAt), causing the code to call cancelControllerRun. The controller rejects the cancel, and the DB row stays stuck at running — even though the terminal status was already available.

Fix in cli-runs.ts:207-217: After confirming the controller status belongs to this run, check controllerStatus.status !== 'running'. If the run is already terminal, persist the controller's terminal status via persistCliRunControllerStatus and return { ok: true, runFound: true, cancelled: false } without issuing a cancel call.
Test in cli-runs.test.ts: Added a test that verifies when the controller reports status: 'completed' for the same run, cancelControllerRun is never called and the DB row is updated to completed with the correct completed_at timestamp.
…lled: true after concurrent DB update

markCliRunCancelled now uses .returning() to detect when the UPDATE
matches zero rows (because a concurrent poll or another cancel already
moved the row out of 'running'), and cancelCliRun uses that result
instead of blindly trusting the controller response.
Add direct tests for every branch in getCliRunStatus — the hot polling
path that carries timestamp matching, lost-run detection, and terminal-
state persistence logic. Also add an optional getControllerStatus param
(matching the existing cancelCliRun injection pattern) so tests can stub
the controller without spinning up KiloClawInternalClient.

Branches now covered:
- Row already terminal: returns DB state, controller never called
- Controller hasRun: false: persists failed with LOST_CONTROLLER_RUN_OUTPUT
- Controller timestamp mismatch: returns running, DB untouched
- Controller running + timestamps match: pass-through with prompt/initiatedBy
- Controller terminal + timestamps match: persists and returns terminal state
- Run not found: returns empty status
…a newer run

Rows stuck in 'running' would accumulate when the controller reused an
instance for a new CLI run. Both getCliRunStatus and cancelCliRun now
persist status: 'failed' with a distinct superseded-run message when
the controller's startedAt doesn't match the DB row, matching the
existing !hasRun recovery pattern.
return {
KiloClawInternalClient: jest.fn().mockImplementation(() => ({
getStatus: getStatusMock,
cancelKiloCliRun: cancelKiloCliRunMock,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: This mock no longer matches the helper's cancel path.

cancelCliRun() now calls client.getKiloCliRunStatus() before it attempts cancelKiloCliRun(). The mock instance only exposes getStatus and cancelKiloCliRun, so if the Jest module mock is applied transitively this test will throw client.getKiloCliRunStatus is not a function instead of exercising the destroyed-instance cancel flow.

@kilo-code-bot
Copy link
Copy Markdown
Contributor

kilo-code-bot Bot commented Apr 15, 2026

Code Review Summary

Status: 2 Issues Found | Recommendation: Address before merge

Overview

Severity Count
CRITICAL 0
WARNING 2
SUGGESTION 0

Fix these issues in Kilo Cloud

Issue Details (click to expand)

WARNING

File Line Issue
apps/web/src/lib/kiloclaw/cli-runs.test.ts 160 The new destroyed-instance coverage test never destroys the instance row, so it does not actually prove resolveWorkerInstanceId() still works for destroyed instances.
Other Observations (not in diff)

Issues found in unchanged code that cannot receive inline comments:

File Line Issue
apps/web/src/routers/kiloclaw-router.test.ts 71 The mocked KiloClawInternalClient still does not expose getKiloCliRunStatus, so the cancel-path test can fail with a mock shape error instead of covering the helper behavior.
Files Reviewed (3 files)
  • apps/web/src/lib/kiloclaw/cli-runs.test.ts - 1 new issue
  • apps/web/src/lib/kiloclaw/cli-runs.ts - 0 new issues
  • apps/web/src/routers/kiloclaw-router.test.ts - 0 new issues

Reviewed by gpt-5.4-20260305 · 817,428 tokens

…R_CODES

The personal router suppresses raw internal controller messages for
gateway_controller_error, but the org router was missing this entry.
Without it, a 409 from the controller on the org startKiloCliRun path
could forward raw error text to the client.
cancelCliRun collapsed hasRun:false (controller has no run) and
startedAt mismatch (controller has a different run) into one branch,
always recording the superseded message. Split them to mirror
getCliRunStatus so admins see the correct failure reason. Replaced the
brittle router-level cancel test (which depended on a mock leak) with
proper helper-level tests using dependency injection.

it('resolves the worker instance ID from a destroyed instance when workerInstanceId is not provided', async () => {
const user = await insertTestUser();
const instanceId = await createTestInstance(user.id);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Destroyed-instance regression is not actually covered

This test creates a normal active kiloclaw_instances row and never marks it destroyed, so it will still pass if resolveWorkerInstanceId() stops including destroyed rows again. The scenario described in the test name only gets exercised if the instance is explicitly destroyed before cancelCliRun() runs.

@evanjacobson evanjacobson force-pushed the feature/kiloclaw-admin-cli_1-schema-cli-runs branch from 9631b3a to 19e9edf Compare April 15, 2026 13:10
Base automatically changed from feature/kiloclaw-admin-cli_1-schema-cli-runs to main April 15, 2026 13:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant