Skip to content

feat: add runtime command boundary#412

Open
thymikee wants to merge 10 commits intomainfrom
feat/runtime-command-boundary
Open

feat: add runtime command boundary#412
thymikee wants to merge 10 commits intomainfrom
feat/runtime-command-boundary

Conversation

@thymikee
Copy link
Copy Markdown
Contributor

@thymikee thymikee commented Apr 15, 2026

Summary

Introduce agent-device as a reusable command runtime instead of a collection of helper exports. The CLI and daemon keep their existing behavior, while command semantics now have a public JS boundary that can run against local or hosted backend adapters.

This PR adds the runtime/backend/io split, migrates the first command families into it, and keeps compatibility subpaths available while the remaining command inventory is migrated.

Scope: 66 files touched across the runtime public boundary, migrated capture/selector/interaction slices, tests/docs, and one Android replay hardening change.

New JS API

Root runtime entrypoint:

import {
  createAgentDevice,
  createLocalArtifactAdapter,
  createMemorySessionStore,
  localCommandPolicy,
  ref,
  selector,
  type AgentDeviceBackend,
} from 'agent-device';

const backend: AgentDeviceBackend = /* local or hosted adapter */;

const device = createAgentDevice({
  backend,
  artifacts: createLocalArtifactAdapter(),
  sessions: createMemorySessionStore([{ name: 'default' }]),
  policy: localCommandPolicy(),
});

await device.capture.screenshot({
  session: 'default',
  out: { kind: 'path', path: './screen.png' },
});

await device.capture.diffScreenshot({
  session: 'default',
  baseline: { kind: 'path', path: './baseline.png' },
  out: { kind: 'path', path: './diff.png' },
  threshold: 0.1,
});

await device.selectors.getText(selector('label=Continue'), {
  session: 'default',
  depth: 2,
  raw: true,
});

await device.interactions.click(ref('@e1'), {
  session: 'default',
});

Router entrypoint for transport adapters:

import { createAgentDevice } from 'agent-device';
import { createCommandRouter } from 'agent-device/commands';

const router = createCommandRouter({
  createRuntime(request) {
    return createAgentDevice({
      backend: createBackendForRequest(request),
      artifacts: createArtifactAdapterForRequest(request),
      sessions: createSessionStoreForRequest(request),
      policy: createPolicyForRequest(request),
    });
  },
});

const result = await router.dispatch({
  command: 'capture.screenshot',
  options: {
    session: 'default',
    out: { kind: 'downloadableArtifact', fileName: 'screen.png' },
  },
});

New public subpaths:

  • agent-device/commands: implemented command APIs, selector helpers, command router, command catalog
  • agent-device/backend: backend primitive and capability interfaces
  • agent-device/io: file refs, artifact descriptors, output reservation, temp file contracts
  • agent-device/testing/conformance: backend conformance suites for migrated command families

createAgentDeviceClient() remains the daemon RPC client. createAgentDevice() is the in-process runtime.

Key Changes

  • Add createAgentDevice() with explicit backend, artifacts, sessions, policy, diagnostics, clock, and signal runtime dependencies.
  • Add first-class IO refs:
    • inputs: { kind: 'path' } and { kind: 'uploadedArtifact' }
    • outputs: { kind: 'path' } and { kind: 'downloadableArtifact' }
    • artifacts: discriminated localPath / artifact descriptors
  • Add localCommandPolicy() and restrictedCommandPolicy() so local filesystem behavior is explicit and service-safe defaults are available.
  • Add createMemorySessionStore() with defensive cloning for snapshots and metadata.
  • Document that shared CommandSessionStore implementations must serialize per-session updates when used concurrently.
  • Add named backend capability gates for escape hatches instead of a freeform backend bag.
  • Expose only implemented bound commands from createAgentDevice(); planned commands live in commandCatalog rather than autocomplete-only throwing methods.
  • Keep macOS interaction policy tied to trusted session state; caller metadata cannot override desktop/menubar guards.
  • Preserve selector snapshot options (depth, scope, raw) through the public runtime API and daemon bridge.
  • Preserve recording metadata from runtime-routed touch interactions through daemon responses and recorded action results.
  • Ensure Android escape detection fails before action recording/finalization.
  • Use snapshot timestamps consistently for stale-drop warnings, including backend-supplied snapshot objects and injected runtime clocks.
  • Preserve screenshot surface options through live diff screenshot capture, CLI diff screenshot, and the CLI client bridge.
  • Harden Android e2e replay navigation by replacing a brittle Settings snapshot ref with a semantic label=Notifications selector.

Migrated Runtime Commands

Capture:

  • capture.screenshot
  • capture.diffScreenshot
  • capture.snapshot
  • capture.diffSnapshot

Selectors:

  • selectors.find
  • selectors.get
  • selectors.getText
  • selectors.getAttrs
  • selectors.is
  • selectors.isVisible
  • selectors.isHidden
  • selectors.wait
  • selectors.waitForText

Interactions:

  • interactions.click
  • interactions.press
  • interactions.fill
  • interactions.typeText

The daemon and CLI paths now route these slices through the runtime while preserving existing daemon-shaped responses, selector recording, snapshot flag behavior, macOS surface guards, stale-ref refresh, Android escape handling, recording metadata, and screenshot cleanup semantics.

Follow-ups

Remaining command families are tracked in COMMAND_OWNERSHIP.md. Compatibility helper subpaths stay available for now and are documented as migration-era APIs rather than the preferred public boundary.

Validation

  • pnpm format
  • pnpm check
  • pnpm test:integration
  • node --test test/integration/installed-package-metro.test.ts
  • pnpm test:unit src/__tests__/cli-client-commands.test.ts src/__tests__/runtime-snapshot.test.ts
  • pnpm test:unit src/__tests__/runtime-selector-read.test.ts src/__tests__/runtime-interactions.test.ts src/__tests__/runtime-diff-screenshot.test.ts src/__tests__/runtime-snapshot.test.ts src/__tests__/runtime-public.test.ts src/daemon/handlers/__tests__/interaction.test.ts
  • pnpm test:replay:ios (6 passed, 0 failed)
  • pnpm test:replay:android (6 passed, 0 failed after hardening the stale Android Settings replay ref)
  • pnpm check:quick
  • pnpm check:unit
  • pnpm check:tooling
  • git diff --check

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 15, 2026

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://callstackincubator.github.io/agent-device/pr-preview/pr-412/

Built to branch gh-pages at 2026-04-15 17:02 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 80dd64f5c9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/daemon/selector-runtime.ts Outdated
Comment on lines +294 to +296
snapshotScope: options?.scope,
snapshotDepth: options?.depth,
snapshotRaw: options?.raw,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve selector snapshot flags in runtime capture

The selector runtime now overwrites snapshotScope, snapshotDepth, and snapshotRaw with values from options, but for get/is/wait (and read-only find in most paths) those fields are usually undefined. Because this assignment happens after spreading req.flags, user-provided CLI flags like --snapshot-depth and --snapshot-raw are silently dropped, changing selector resolution behavior versus the previous handler path. Only override these fields when an explicit runtime option is provided.

Useful? React with 👍 / 👎.

Comment thread src/runtime.ts Outdated
function cloneSessionRecord(
record: CommandSessionRecord | undefined,
): CommandSessionRecord | undefined {
return record ? { ...record } : undefined;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Deep-clone session records returned by memory store

The in-memory session store returns a shallow copy of a session record, so nested objects (notably snapshot) are still shared with internal state. If a caller mutates record.snapshot.nodes after get, it mutates the store’s baseline in place, which can corrupt later diff/get/is behavior and defeats the “no mutable references” contract implied by this helper. Clone nested session fields (at least snapshot/metadata) before returning.

Useful? React with 👍 / 👎.

@thymikee
Copy link
Copy Markdown
Contributor Author

Code Review — PR #412 (feat: add runtime command boundary)

Thorough pass, focused on regression risk from the ~7.6k-line refactor. I verified each finding against the diff before reporting.

Overview

Clean architectural split. createAgentDevice() is a well-shaped runtime boundary; the commands / backend / io subpaths read as a coherent public API; discriminated IO refs (path / uploadedArtifact / downloadableArtifact) are a nice improvement over ad-hoc path strings. Daemon dispatch wiring is complete — no command family silently stopped being handled. Android escape check now runs via afterRun before finalizeTouchInteraction, which matches the PR description and is the correct ordering.

Findings

High — 750ms find-snapshot cache removed (performance regression)

  • Old src/daemon/handlers/find.ts:73,84,100 kept a per-handler lastSnapshotAt cache that short-circuited re-capture for rapid find loops (explicitly documented: "Re-use a snapshot captured within the last 750 ms to avoid redundant dumps during rapid find iterations"), intentionally skipped when Android freshness tracking was active.
  • New src/commands/selector-read-shared.ts:captureSelectorSnapshot has no equivalent gate — every find / is / get / wait that hits a re-capture path now incurs a full snapshot.
  • Not a correctness bug, but likely a user-visible slowdown on any agent loop that calls find repeatedly. Either port the 750ms window into captureSelectorSnapshot (keyed per session with the same Android-freshness opt-out), or document the change.

Medium — stale-drop warning can be silently disabled by clock skew

  • src/commands/capture-snapshot.ts:222-228 compares params.capturedAt - previousSnapshot.createdAt and now guards elapsedSincePreviousSnapshot >= 0. capturedAt comes from the injected runtime clock; previousSnapshot.createdAt can come from a backend-provided result.snapshot.createdAt (see selector-read-shared.ts:19-24).
  • If a backend provides a createdAt that's ahead of the runtime clock (skew, test clocks, hosted adapter), the >= 0 guard silently skips the "sharp drop" warning that old code (src/daemon/handlers/snapshot.ts:~190, using Date.now() on both sides) always evaluated.
  • Fix: either normalize previousSnapshot.createdAt to the runtime clock when persisting, or drop the >= 0 guard (use Math.abs or clamp at 0).

Medium — createLocalArtifactAdapter path resolution has no containment check

  • src/io.ts:159resolveLocalPath does path.isAbsolute(filePath) ? filePath : path.resolve(cwd, filePath) with no normalization or sandbox check. A { kind: 'path', path: '/etc/passwd' } or '../../etc/passwd' is accepted as-is.
  • For a local CLI this is fine (user owns the FS), and restrictedCommandPolicy() correctly rejects { kind: 'path' } refs at src/commands/io-policy.ts:19,37 before they reach this resolver — so hosted adapters that opt into restrictedCommandPolicy() are safe.
  • Footgun: anyone writing a hosted adapter that (a) forwards untrusted client input and (b) uses localCommandPolicy() will expose arbitrary FS read/write. Worth either a doc warning on createLocalArtifactAdapter ("untrusted paths not validated") or an optional allowedRoots parameter.

Low — bindCommands is re-exported from src/commands/index.ts (L276)

  • Only meaningful consumer is createAgentDevice() inside the package. Not a regression, but it leaks an internal helper into the public agent-device/commands subpath. Recommend dropping the export keyword.

Low — sync → async temp cleanup

  • Old src/cli/commands/screenshot.ts:119,138 used fs.unlinkSync; new src/commands/capture-diff-screenshot.ts uses await fs.unlink. Semantically equivalent in the happy path; slightly more exposure to orphaned temp files if the process is killed mid-await. Not worth blocking on.

Claims I dismissed after verification

  • Rect-refresh / hittable-ancestor / viewport-visibility logic was NOT dropped. It moved from src/daemon/handlers/interaction-targeting.ts:86-200 (old) into src/commands/interactions.ts:222,256,347,415-417 with resolveActionableTouchNode and isNodeVisibleInEffectiveViewport still in the path, including the offscreen_ref error shape.
  • Android escape ordering is correct. interaction-touch.ts:115-120 runs afterRun (escape check) before finalizeTouchInteraction; on escape the error is re-thrown and the touch is not recorded to session.
  • Android escape tests are preserved (src/daemon/handlers/__tests__/interaction.test.ts:850,896 — launcher + Settings escape). Line numbers shifted, coverage didn't.
  • snapshot.ts trimming didn't lose branches. Platform-specific warnings (empty-interactive, stuck-route, sharp-drop) all migrated into src/commands/capture-snapshot.ts buildSnapshotWarnings.

Test coverage

Good breadth — the new runtime tests plus the conformance suite cover the migrated command families end-to-end, and the old daemon-handler tests remain in place. One gap worth considering: no test explicitly asserts reserveOutput().cleanup() fires when the backend throws mid-command. Given the async unlink shift, a single failure-path test in runtime-public.test.ts would be cheap insurance.

Summary

Ship-ready with two asks before merge: (1) restore the find-loop snapshot coalescing gate or document its removal, and (2) decide on the clock-skew semantics for the stale-drop warning. The path-containment concern is a doc/hardening follow-up rather than a blocker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant