refactor(kiloclaw): extract inline CLI run logic from user and org routers into shared helpers#2430
refactor(kiloclaw): extract inline CLI run logic from user and org routers into shared helpers#2430evanjacobson wants to merge 10 commits intomainfrom
Conversation
…leanup for initiated_by_admin_id
…completed, failed, cancelled) for the same run, isControllerStatusForRun returns true (it only checks hasRun + startedAt), causing the code to call cancelControllerRun. The controller rejects the cancel, and the DB row stays stuck at running — even though the terminal status was already available.
Fix in cli-runs.ts:207-217: After confirming the controller status belongs to this run, check controllerStatus.status !== 'running'. If the run is already terminal, persist the controller's terminal status via persistCliRunControllerStatus and return { ok: true, runFound: true, cancelled: false } without issuing a cancel call.
Test in cli-runs.test.ts: Added a test that verifies when the controller reports status: 'completed' for the same run, cancelControllerRun is never called and the DB row is updated to completed with the correct completed_at timestamp.
…lled: true after concurrent DB update markCliRunCancelled now uses .returning() to detect when the UPDATE matches zero rows (because a concurrent poll or another cancel already moved the row out of 'running'), and cancelCliRun uses that result instead of blindly trusting the controller response.
Add direct tests for every branch in getCliRunStatus — the hot polling path that carries timestamp matching, lost-run detection, and terminal- state persistence logic. Also add an optional getControllerStatus param (matching the existing cancelCliRun injection pattern) so tests can stub the controller without spinning up KiloClawInternalClient. Branches now covered: - Row already terminal: returns DB state, controller never called - Controller hasRun: false: persists failed with LOST_CONTROLLER_RUN_OUTPUT - Controller timestamp mismatch: returns running, DB untouched - Controller running + timestamps match: pass-through with prompt/initiatedBy - Controller terminal + timestamps match: persists and returns terminal state - Run not found: returns empty status
…a newer run Rows stuck in 'running' would accumulate when the controller reused an instance for a new CLI run. Both getCliRunStatus and cancelCliRun now persist status: 'failed' with a distinct superseded-run message when the controller's startedAt doesn't match the DB row, matching the existing !hasRun recovery pattern.
…uters into shared helpers
| return { | ||
| KiloClawInternalClient: jest.fn().mockImplementation(() => ({ | ||
| getStatus: getStatusMock, | ||
| cancelKiloCliRun: cancelKiloCliRunMock, |
There was a problem hiding this comment.
WARNING: This mock no longer matches the helper's cancel path.
cancelCliRun() now calls client.getKiloCliRunStatus() before it attempts cancelKiloCliRun(). The mock instance only exposes getStatus and cancelKiloCliRun, so if the Jest module mock is applied transitively this test will throw client.getKiloCliRunStatus is not a function instead of exercising the destroyed-instance cancel flow.
Code Review SummaryStatus: 2 Issues Found | Recommendation: Address before merge Overview
Fix these issues in Kilo Cloud Issue Details (click to expand)WARNING
Other Observations (not in diff)Issues found in unchanged code that cannot receive inline comments:
Files Reviewed (3 files)
Reviewed by gpt-5.4-20260305 · 817,428 tokens |
…R_CODES The personal router suppresses raw internal controller messages for gateway_controller_error, but the org router was missing this entry. Without it, a 409 from the controller on the org startKiloCliRun path could forward raw error text to the client.
cancelCliRun collapsed hasRun:false (controller has no run) and startedAt mismatch (controller has a different run) into one branch, always recording the superseded message. Split them to mirror getCliRunStatus so admins see the correct failure reason. Replaced the brittle router-level cancel test (which depended on a mock leak) with proper helper-level tests using dependency injection.
|
|
||
| it('resolves the worker instance ID from a destroyed instance when workerInstanceId is not provided', async () => { | ||
| const user = await insertTestUser(); | ||
| const instanceId = await createTestInstance(user.id); |
There was a problem hiding this comment.
WARNING: Destroyed-instance regression is not actually covered
This test creates a normal active kiloclaw_instances row and never marks it destroyed, so it will still pass if resolveWorkerInstanceId() stops including destroyed rows again. The scenario described in the test name only gets exercised if the instance is explicitly destroyed before cancelCliRun() runs.
9631b3a to
19e9edf
Compare
Summary
The user and org routers duplicated CLI run persistence logic — inline DB inserts, manual status polling with persistence, and raw cancel calls. This extracts that into the shared
cli-runs.tshelpers from Phase 2, making both routers thinner and ensuring they benefit from the cross-run timestamp guard and instance-scoped cancel fixes.The org router also gains 409 conflict and
controller_route_unavailableerror handling that was previously missing.Verification
pnpm --filter web typecheckpnpm formatgit diff --cached --checkVisual Changes
N/A
Reviewer Notes
This PR targets the Phase 2 branch (
feature/kiloclaw-admin-cli_1-schema-cli-runs) as its base. Once Phase 2 merges, rebase onto main.