feat(appkit): agents() plugin, createAgent(def), and markdown-driven agents by MarioCadenas · Pull Request #304 · databricks/appkit

MarioCadenas · 2026-04-21T17:58:43Z

The main product layer. Turns an AppKit app into an AI-agent host with
markdown-driven agent discovery, code-defined agents, sub-agents, and
a standalone run-without-HTTP executor.

`createAgent(def)` — pure factory

packages/appkit/src/core/create-agent-def.ts. Returns the passed-in
definition after cycle-detecting the sub-agent graph. No adapter
construction, no side effects — safe at module top-level. The returned
AgentDefinition is plain data, consumable by either agents({ agents }) or runAgent(def, input).

`agents()` plugin

packages/appkit/src/plugins/agents/agents.ts. AgentsPlugin class:

Loads markdown agents from config/agents/*.md (configurable dir)
via real YAML frontmatter parsing (js-yaml). Frontmatter schema:
endpoint, model, toolkits, tools, default, maxSteps,
maxTokens, baseSystemPrompt. Unknown keys logged, invalid YAML
throws at boot.
Merges code-defined agents passed via agents({ agents: { name: def } }). Code wins on key collision.
For each agent, builds a per-agent tool index from:
1. Sub-agents (agents: {...}) — synthesized as agent-<key>
  tools on the parent.
2. Explicit tool record entries — ToolkitEntrys, inline
  FunctionTools, or HostedTools.
3. Auto-inherit (if nothing explicit) — pulls every registered
  ToolProvider plugin's tools. Asymmetric default: markdown
  agents inherit (file: true), code-defined agents don't (code: false).
Mounts POST /invocations (OpenAI Responses compatible) + POST /chat, POST /cancel, GET /threads/:id, DELETE /threads/:id,
GET /info.
SSE streaming via executeStream. Tool calls dispatch through
PluginContext.executeTool(req, pluginName, localName, args, signal)
for OBO, telemetry, and timeout.
Exposes appkit.agent.{register, list, get, reload, getDefault, getThreads} runtime helpers.

`runAgent(def, input)` — standalone executor

packages/appkit/src/core/run-agent.ts. Runs an AgentDefinition
without createApp or HTTP. Drives the adapter's event stream to
completion, executing inline tools + sub-agents along the way.
Aggregates events into { text, events }. Useful for tests, CLI
scripts, and offline pipelines. Hosted/MCP tools and plugin toolkits
require the agents plugin and throw clear errors with guidance.

Event translation and thread storage

AgentEventTranslator — stateful converter from internal
AgentEvents to OpenAI Responses API ResponseStreamEvents with
sequence numbers and output indices.
InMemoryThreadStore — per-user conversation persistence. Nested
Map<userId, Map<threadId, Thread>>. Implements ThreadStore from
shared types.
buildBaseSystemPrompt + composeSystemPrompt — formats the
AppKit base prompt (with plugin names and tool names) and layers
the agent's instructions on top.

Frontmatter loader

load-agents.ts — reads *.md files, parses YAML frontmatter with
js-yaml, resolves toolkits: [...] entries against the plugin
provider index at load time, wraps ambient tools (from agents({ tools: {...} })) for tools: [...] frontmatter references.

Plumbing

Adds js-yaml + @types/js-yaml deps.
Manifest mounts routes at /api/agent/* (singular — matches
appkit.agent.* runtime handle).
Exports from the main barrel: agents, createAgent, runAgent,
AgentDefinition, AgentsPluginConfig, AgentTool, ToolkitEntry,
ToolkitOptions, BaseSystemPromptOption, PromptContext,
isToolkitEntry, loadAgentFromFile, loadAgentsFromDir.

Test plan

60 new tests: agents plugin lifecycle, markdown loading, code-agent
registration, auto-inherit asymmetry, sub-agent tool synthesis,
cycle detection, event translator, thread store, system prompt
composition, standalone runAgent.
Full appkit vitest suite: 1297 tests passing.
Typecheck clean across all 8 workspace projects.

Signed-off-by: MarioCadenas [email protected]

PR Stack

Shared agent types + LLM adapters — feat(appkit): shared agent types and LLM adapter implementations #301
Tool primitives + ToolProvider surfaces — feat(appkit): tool primitives and ToolProvider surfaces on core plugins #302
Plugin infrastructure (attachContext + PluginContext) — feat(appkit): plugin infrastructure — attachContext + PluginContext mediator #303
agents() plugin + createAgent(def) + markdown-driven agents (this PR)
fromPlugin() DX + runAgent plugins arg + toolkit-resolver — feat(appkit): fromPlugin() DX, runAgent plugins arg, shared toolkit-resolver #305
Reference app + dev-playground + docs — feat(appkit): reference agent-app, dev-playground chat UI, docs, and template #306

This 6-PR stack is a redesign of the earlier 13-PR stack (#282-#286, #293-#300), reorganized to eliminate mid-stack API churn. Reviewers never see code in this stack that later PRs delete. The old 13 PRs have been closed with references back here.

…template Final layer of the agents feature stack. Everything needed to exercise, demonstrate, and learn the feature. ### Reference application: agent-app `apps/agent-app/` — a standalone app purpose-built around the agents feature. Ships with: - `server.ts` — full example of code-defined agents via `fromPlugin`: ```ts const support = createAgent({ instructions: "…", tools: { ...fromPlugin(analytics), ...fromPlugin(files), get_weather, "mcp.vector-search": mcpServer("vector-search", "https://…"), }, }); await createApp({ plugins: [server({ port }), analytics(), files(), agents({ agents: { support } })], }); ``` - `config/agents/assistant.md` — markdown-driven agent alongside the code-defined one, showing the asymmetric auto-inherit default. - Vite + React 19 + TailwindCSS frontend with a chat UI. - Databricks deployment config (`databricks.yml`, `app.yaml`) and deploy scripts. ### dev-playground chat UI + demo agent `apps/dev-playground/client/src/routes/agent.route.tsx` — chat UI with inline autocomplete (hits the `autocomplete` markdown agent) and a full threaded conversation panel (hits the default agent). `apps/dev-playground/server/index.ts` — adds a code-defined `helper` agent using `fromPlugin(analytics)` alongside the markdown-driven `autocomplete` agent in `config/agents/`. Exercises the mixed-style setup (markdown + code) against the same plugin list. `apps/dev-playground/config/agents/*.md` — both agents defined with valid YAML frontmatter. ### Docs `docs/docs/plugins/agents.md` — progressive five-level guide: 1. Drop a markdown file → it just works. 2. Scope tools via `toolkits:` / `tools:` frontmatter. 3. Code-defined agents with `fromPlugin()`. 4. Sub-agents. 5. Standalone `runAgent()` (no `createApp` or HTTP). Plus a configuration reference, runtime API reference, and frontmatter schema table. `docs/docs/api/appkit/` — regenerated typedoc for the new public surface (fromPlugin, runAgent, AgentDefinition, AgentsPluginConfig, ToolkitEntry, ToolkitOptions, all adapter types, and the agents plugin factory). ### Template `template/appkit.plugins.json` — adds the `agent` plugin entry so `npx @databricks/appkit init --features agent` scaffolds the plugin correctly. ### Test plan - Full appkit vitest suite: 1311 tests passing - Typecheck clean across all 8 workspace projects - `pnpm docs:build` clean (no broken links) - `pnpm --filter=@databricks/appkit build:package` clean, publint clean Signed-off-by: MarioCadenas <[email protected]> ### Zero-trust MCP host policy documentation (S1 security) Documents the new `mcp` configuration block and the rules it enforces: same-origin-only by default, explicit `trustedHosts` for external MCP servers, plaintext `http://` refused outside localhost-in-dev, and DNS-level blocking of private / link-local IP ranges (covers cloud metadata services). See PR #302 for the policy implementation and PR #304 for the `AgentsPluginConfig.mcp` wiring. Signed-off-by: MarioCadenas <[email protected]>

…template Final layer of the agents feature stack. Everything needed to exercise, demonstrate, and learn the feature. ### Reference application: agent-app `apps/agent-app/` — a standalone app purpose-built around the agents feature. Ships with: - `server.ts` — full example of code-defined agents via `fromPlugin`: ```ts const support = createAgent({ instructions: "…", tools: { ...fromPlugin(analytics), ...fromPlugin(files), get_weather, "mcp.vector-search": mcpServer("vector-search", "https://…"), }, }); await createApp({ plugins: [server({ port }), analytics(), files(), agents({ agents: { support } })], }); ``` - `config/agents/assistant.md` — markdown-driven agent alongside the code-defined one, showing the asymmetric auto-inherit default. - Vite + React 19 + TailwindCSS frontend with a chat UI. - Databricks deployment config (`databricks.yml`, `app.yaml`) and deploy scripts. ### dev-playground chat UI + demo agent `apps/dev-playground/client/src/routes/agent.route.tsx` — chat UI with inline autocomplete (hits the `autocomplete` markdown agent) and a full threaded conversation panel (hits the default agent). `apps/dev-playground/server/index.ts` — adds a code-defined `helper` agent using `fromPlugin(analytics)` alongside the markdown-driven `autocomplete` agent in `config/agents/`. Exercises the mixed-style setup (markdown + code) against the same plugin list. `apps/dev-playground/config/agents/*.md` — both agents defined with valid YAML frontmatter. ### Docs `docs/docs/plugins/agents.md` — progressive five-level guide: 1. Drop a markdown file → it just works. 2. Scope tools via `toolkits:` / `tools:` frontmatter. 3. Code-defined agents with `fromPlugin()`. 4. Sub-agents. 5. Standalone `runAgent()` (no `createApp` or HTTP). Plus a configuration reference, runtime API reference, and frontmatter schema table. `docs/docs/api/appkit/` — regenerated typedoc for the new public surface (fromPlugin, runAgent, AgentDefinition, AgentsPluginConfig, ToolkitEntry, ToolkitOptions, all adapter types, and the agents plugin factory). ### Template `template/appkit.plugins.json` — adds the `agent` plugin entry so `npx @databricks/appkit init --features agent` scaffolds the plugin correctly. ### Test plan - Full appkit vitest suite: 1311 tests passing - Typecheck clean across all 8 workspace projects - `pnpm docs:build` clean (no broken links) - `pnpm --filter=@databricks/appkit build:package` clean, publint clean Signed-off-by: MarioCadenas <[email protected]> ### Zero-trust MCP host policy documentation (S1 security) Documents the new `mcp` configuration block and the rules it enforces: same-origin-only by default, explicit `trustedHosts` for external MCP servers, plaintext `http://` refused outside localhost-in-dev, and DNS-level blocking of private / link-local IP ranges (covers cloud metadata services). See PR #302 for the policy implementation and PR #304 for the `AgentsPluginConfig.mcp` wiring. Signed-off-by: MarioCadenas <[email protected]> ### HITL approval UI + SQL safety docs (S2 security, Layer 1-3) - `docs/docs/plugins/agents.md`: new "SQL agent tools" subsection covering `analytics.query` readOnly enforcement, `lakebase.query` opt-in via `exposeAsAgentTool`, and the approval flow. New "Human-in-the-loop approval for destructive tools" subsection documents the config, SSE event shape, and `POST /chat/approve` contract. - `apps/agent-app`: approval-card component rendered inline in the chat stream whenever an `appkit.approval_pending` event arrives. Destructive badge + Approve/Deny buttons POST to `/api/agent/approve` with the carried `streamId`/`approvalId`. - `apps/dev-playground/client`: matching approval-card on the agent route, using the existing appkit-ui `Button` component and Tailwind utility classes. Signed-off-by: MarioCadenas <[email protected]>

…template Final layer of the agents feature stack. Everything needed to exercise, demonstrate, and learn the feature. ### Reference application: agent-app `apps/agent-app/` — a standalone app purpose-built around the agents feature. Ships with: - `server.ts` — full example of code-defined agents via `fromPlugin`: ```ts const support = createAgent({ instructions: "…", tools: { ...fromPlugin(analytics), ...fromPlugin(files), get_weather, "mcp.vector-search": mcpServer("vector-search", "https://…"), }, }); await createApp({ plugins: [server({ port }), analytics(), files(), agents({ agents: { support } })], }); ``` - `config/agents/assistant.md` — markdown-driven agent alongside the code-defined one, showing the asymmetric auto-inherit default. - Vite + React 19 + TailwindCSS frontend with a chat UI. - Databricks deployment config (`databricks.yml`, `app.yaml`) and deploy scripts. ### dev-playground chat UI + demo agent `apps/dev-playground/client/src/routes/agent.route.tsx` — chat UI with inline autocomplete (hits the `autocomplete` markdown agent) and a full threaded conversation panel (hits the default agent). `apps/dev-playground/server/index.ts` — adds a code-defined `helper` agent using `fromPlugin(analytics)` alongside the markdown-driven `autocomplete` agent in `config/agents/`. Exercises the mixed-style setup (markdown + code) against the same plugin list. `apps/dev-playground/config/agents/*.md` — both agents defined with valid YAML frontmatter. ### Docs `docs/docs/plugins/agents.md` — progressive five-level guide: 1. Drop a markdown file → it just works. 2. Scope tools via `toolkits:` / `tools:` frontmatter. 3. Code-defined agents with `fromPlugin()`. 4. Sub-agents. 5. Standalone `runAgent()` (no `createApp` or HTTP). Plus a configuration reference, runtime API reference, and frontmatter schema table. `docs/docs/api/appkit/` — regenerated typedoc for the new public surface (fromPlugin, runAgent, AgentDefinition, AgentsPluginConfig, ToolkitEntry, ToolkitOptions, all adapter types, and the agents plugin factory). ### Template `template/appkit.plugins.json` — adds the `agent` plugin entry so `npx @databricks/appkit init --features agent` scaffolds the plugin correctly. ### Test plan - Full appkit vitest suite: 1311 tests passing - Typecheck clean across all 8 workspace projects - `pnpm docs:build` clean (no broken links) - `pnpm --filter=@databricks/appkit build:package` clean, publint clean Signed-off-by: MarioCadenas <[email protected]> ### Zero-trust MCP host policy documentation (S1 security) Documents the new `mcp` configuration block and the rules it enforces: same-origin-only by default, explicit `trustedHosts` for external MCP servers, plaintext `http://` refused outside localhost-in-dev, and DNS-level blocking of private / link-local IP ranges (covers cloud metadata services). See PR #302 for the policy implementation and PR #304 for the `AgentsPluginConfig.mcp` wiring. Signed-off-by: MarioCadenas <[email protected]> ### HITL approval UI + SQL safety docs (S2 security, Layer 1-3) - `docs/docs/plugins/agents.md`: new "SQL agent tools" subsection covering `analytics.query` readOnly enforcement, `lakebase.query` opt-in via `exposeAsAgentTool`, and the approval flow. New "Human-in-the-loop approval for destructive tools" subsection documents the config, SSE event shape, and `POST /chat/approve` contract. - `apps/agent-app`: approval-card component rendered inline in the chat stream whenever an `appkit.approval_pending` event arrives. Destructive badge + Approve/Deny buttons POST to `/api/agent/approve` with the carried `streamId`/`approvalId`. - `apps/dev-playground/client`: matching approval-card on the agent route, using the existing appkit-ui `Button` component and Tailwind utility classes. Signed-off-by: MarioCadenas <[email protected]> ### Auto-inherit posture documentation (S3 security, Layer 3) Updates `docs/docs/plugins/agents.md` to document the new two-key auto-inherit model introduced in PR #302 (per-tool `autoInheritable` flag) and PR #304 (safe-by-default `autoInheritTools: { file: false, code: false }`). Adds an "Auto-inherit posture" subsection explaining that the developer must opt into `autoInheritTools` AND the plugin author must mark each tool `autoInheritable: true` for a tool to spread without explicit wiring. Includes a table documenting the `autoInheritable` marking on each core plugin tool, plus an example of the setup-time audit log so operators can see exactly what's inherited vs. skipped. Signed-off-by: MarioCadenas <[email protected]>

…template Final layer of the agents feature stack. Everything needed to exercise, demonstrate, and learn the feature. ### Reference application: agent-app `apps/agent-app/` — a standalone app purpose-built around the agents feature. Ships with: - `server.ts` — full example of code-defined agents via `fromPlugin`: ```ts const support = createAgent({ instructions: "…", tools: { ...fromPlugin(analytics), ...fromPlugin(files), get_weather, "mcp.vector-search": mcpServer("vector-search", "https://…"), }, }); await createApp({ plugins: [server({ port }), analytics(), files(), agents({ agents: { support } })], }); ``` - `config/agents/assistant.md` — markdown-driven agent alongside the code-defined one, showing the asymmetric auto-inherit default. - Vite + React 19 + TailwindCSS frontend with a chat UI. - Databricks deployment config (`databricks.yml`, `app.yaml`) and deploy scripts. ### dev-playground chat UI + demo agent `apps/dev-playground/client/src/routes/agent.route.tsx` — chat UI with inline autocomplete (hits the `autocomplete` markdown agent) and a full threaded conversation panel (hits the default agent). `apps/dev-playground/server/index.ts` — adds a code-defined `helper` agent using `fromPlugin(analytics)` alongside the markdown-driven `autocomplete` agent in `config/agents/`. Exercises the mixed-style setup (markdown + code) against the same plugin list. `apps/dev-playground/config/agents/*.md` — both agents defined with valid YAML frontmatter. ### Docs `docs/docs/plugins/agents.md` — progressive five-level guide: 1. Drop a markdown file → it just works. 2. Scope tools via `toolkits:` / `tools:` frontmatter. 3. Code-defined agents with `fromPlugin()`. 4. Sub-agents. 5. Standalone `runAgent()` (no `createApp` or HTTP). Plus a configuration reference, runtime API reference, and frontmatter schema table. `docs/docs/api/appkit/` — regenerated typedoc for the new public surface (fromPlugin, runAgent, AgentDefinition, AgentsPluginConfig, ToolkitEntry, ToolkitOptions, all adapter types, and the agents plugin factory). ### Template `template/appkit.plugins.json` — adds the `agent` plugin entry so `npx @databricks/appkit init --features agent` scaffolds the plugin correctly. ### Test plan - Full appkit vitest suite: 1311 tests passing - Typecheck clean across all 8 workspace projects - `pnpm docs:build` clean (no broken links) - `pnpm --filter=@databricks/appkit build:package` clean, publint clean Signed-off-by: MarioCadenas <[email protected]> ### Zero-trust MCP host policy documentation (S1 security) Documents the new `mcp` configuration block and the rules it enforces: same-origin-only by default, explicit `trustedHosts` for external MCP servers, plaintext `http://` refused outside localhost-in-dev, and DNS-level blocking of private / link-local IP ranges (covers cloud metadata services). See PR #302 for the policy implementation and PR #304 for the `AgentsPluginConfig.mcp` wiring. Signed-off-by: MarioCadenas <[email protected]> ### HITL approval UI + SQL safety docs (S2 security, Layer 1-3) - `docs/docs/plugins/agents.md`: new "SQL agent tools" subsection covering `analytics.query` readOnly enforcement, `lakebase.query` opt-in via `exposeAsAgentTool`, and the approval flow. New "Human-in-the-loop approval for destructive tools" subsection documents the config, SSE event shape, and `POST /chat/approve` contract. - `apps/agent-app`: approval-card component rendered inline in the chat stream whenever an `appkit.approval_pending` event arrives. Destructive badge + Approve/Deny buttons POST to `/api/agent/approve` with the carried `streamId`/`approvalId`. - `apps/dev-playground/client`: matching approval-card on the agent route, using the existing appkit-ui `Button` component and Tailwind utility classes. Signed-off-by: MarioCadenas <[email protected]> ### Auto-inherit posture documentation (S3 security, Layer 3) Updates `docs/docs/plugins/agents.md` to document the new two-key auto-inherit model introduced in PR #302 (per-tool `autoInheritable` flag) and PR #304 (safe-by-default `autoInheritTools: { file: false, code: false }`). Adds an "Auto-inherit posture" subsection explaining that the developer must opt into `autoInheritTools` AND the plugin author must mark each tool `autoInheritable: true` for a tool to spread without explicit wiring. Includes a table documenting the `autoInheritable` marking on each core plugin tool, plus an example of the setup-time audit log so operators can see exactly what's inherited vs. skipped. Signed-off-by: MarioCadenas <[email protected]> ### MVP polish - **Reference app no longer ships hardcoded dogfood URLs.** The three `https://e2-dogfood.staging.cloud.databricks.com/...` and `https://mario-mcp-hello-*.staging.aws.databricksapps.com/...` MCP URLs in `apps/agent-app/server.ts` are replaced with optional env-driven `VECTOR_SEARCH_MCP_URL` / `CUSTOM_MCP_URL` config. When set, their hostnames are auto-added to `agents({ mcp: { trustedHosts } })`. `.env.example` uses placeholder values the reader can replace instead of another team's workspace. - **`appkit.agent` → `appkit.agents` in the reference app.** The prior `appkit.agent as { list, getDefault }` cast papered over the plugin-name mismatch fixed in PR #304. The runtime key now matches the docs, the manifest, and the factory name; the cast is gone. - **Auto-inherit opt-in added to the reference config.** Since the defaults flipped to `{ file: false, code: false }` (PR #304, S-3), the reference now explicitly enables `autoInheritTools: { file: true }` so the markdown agents that ship alongside the code-defined one still pick up the analytics / files read-only tools. This is the pattern a real deployment should follow — opt in deliberately. Signed-off-by: MarioCadenas <[email protected]>

…template Final layer of the agents feature stack. Everything needed to exercise, demonstrate, and learn the feature. ### Reference application: agent-app `apps/agent-app/` — a standalone app purpose-built around the agents feature. Ships with: - `server.ts` — full example of code-defined agents via `fromPlugin`: ```ts const support = createAgent({ instructions: "…", tools: { ...fromPlugin(analytics), ...fromPlugin(files), get_weather, "mcp.vector-search": mcpServer("vector-search", "https://…"), }, }); await createApp({ plugins: [server({ port }), analytics(), files(), agents({ agents: { support } })], }); ``` - `config/agents/assistant.md` — markdown-driven agent alongside the code-defined one, showing the asymmetric auto-inherit default. - Vite + React 19 + TailwindCSS frontend with a chat UI. - Databricks deployment config (`databricks.yml`, `app.yaml`) and deploy scripts. ### dev-playground chat UI + demo agent `apps/dev-playground/client/src/routes/agent.route.tsx` — chat UI with inline autocomplete (hits the `autocomplete` markdown agent) and a full threaded conversation panel (hits the default agent). `apps/dev-playground/server/index.ts` — adds a code-defined `helper` agent using `fromPlugin(analytics)` alongside the markdown-driven `autocomplete` agent in `config/agents/`. Exercises the mixed-style setup (markdown + code) against the same plugin list. `apps/dev-playground/config/agents/*.md` — both agents defined with valid YAML frontmatter. ### Docs `docs/docs/plugins/agents.md` — progressive five-level guide: 1. Drop a markdown file → it just works. 2. Scope tools via `toolkits:` / `tools:` frontmatter. 3. Code-defined agents with `fromPlugin()`. 4. Sub-agents. 5. Standalone `runAgent()` (no `createApp` or HTTP). Plus a configuration reference, runtime API reference, and frontmatter schema table. `docs/docs/api/appkit/` — regenerated typedoc for the new public surface (fromPlugin, runAgent, AgentDefinition, AgentsPluginConfig, ToolkitEntry, ToolkitOptions, all adapter types, and the agents plugin factory). ### Template `template/appkit.plugins.json` — adds the `agent` plugin entry so `npx @databricks/appkit init --features agent` scaffolds the plugin correctly. ### Test plan - Full appkit vitest suite: 1311 tests passing - Typecheck clean across all 8 workspace projects - `pnpm docs:build` clean (no broken links) - `pnpm --filter=@databricks/appkit build:package` clean, publint clean Signed-off-by: MarioCadenas <[email protected]> ### Zero-trust MCP host policy documentation (S1 security) Documents the new `mcp` configuration block and the rules it enforces: same-origin-only by default, explicit `trustedHosts` for external MCP servers, plaintext `http://` refused outside localhost-in-dev, and DNS-level blocking of private / link-local IP ranges (covers cloud metadata services). See PR #302 for the policy implementation and PR #304 for the `AgentsPluginConfig.mcp` wiring. Signed-off-by: MarioCadenas <[email protected]> ### HITL approval UI + SQL safety docs (S2 security, Layer 1-3) - `docs/docs/plugins/agents.md`: new "SQL agent tools" subsection covering `analytics.query` readOnly enforcement, `lakebase.query` opt-in via `exposeAsAgentTool`, and the approval flow. New "Human-in-the-loop approval for destructive tools" subsection documents the config, SSE event shape, and `POST /chat/approve` contract. - `apps/agent-app`: approval-card component rendered inline in the chat stream whenever an `appkit.approval_pending` event arrives. Destructive badge + Approve/Deny buttons POST to `/api/agent/approve` with the carried `streamId`/`approvalId`. - `apps/dev-playground/client`: matching approval-card on the agent route, using the existing appkit-ui `Button` component and Tailwind utility classes. Signed-off-by: MarioCadenas <[email protected]> ### Auto-inherit posture documentation (S3 security, Layer 3) Updates `docs/docs/plugins/agents.md` to document the new two-key auto-inherit model introduced in PR #302 (per-tool `autoInheritable` flag) and PR #304 (safe-by-default `autoInheritTools: { file: false, code: false }`). Adds an "Auto-inherit posture" subsection explaining that the developer must opt into `autoInheritTools` AND the plugin author must mark each tool `autoInheritable: true` for a tool to spread without explicit wiring. Includes a table documenting the `autoInheritable` marking on each core plugin tool, plus an example of the setup-time audit log so operators can see exactly what's inherited vs. skipped. Signed-off-by: MarioCadenas <[email protected]> ### MVP polish - **Reference app no longer ships hardcoded dogfood URLs.** The three `https://e2-dogfood.staging.cloud.databricks.com/...` and `https://mario-mcp-hello-*.staging.aws.databricksapps.com/...` MCP URLs in `apps/agent-app/server.ts` are replaced with optional env-driven `VECTOR_SEARCH_MCP_URL` / `CUSTOM_MCP_URL` config. When set, their hostnames are auto-added to `agents({ mcp: { trustedHosts } })`. `.env.example` uses placeholder values the reader can replace instead of another team's workspace. - **`appkit.agent` → `appkit.agents` in the reference app.** The prior `appkit.agent as { list, getDefault }` cast papered over the plugin-name mismatch fixed in PR #304. The runtime key now matches the docs, the manifest, and the factory name; the cast is gone. - **Auto-inherit opt-in added to the reference config.** Since the defaults flipped to `{ file: false, code: false }` (PR #304, S-3), the reference now explicitly enables `autoInheritTools: { file: true }` so the markdown agents that ship alongside the code-defined one still pick up the analytics / files read-only tools. This is the pattern a real deployment should follow — opt in deliberately. Signed-off-by: MarioCadenas <[email protected]> ### Ephemeral autocomplete agent + docs - `apps/dev-playground/config/agents/autocomplete.md` sets `ephemeral: true`. Each debounced autocomplete keystroke no longer leaves an orphan thread in `InMemoryThreadStore` — the server now deletes the thread in the stream's `finally` (PR #304). Closes R1 from the MVP re-review. - `docs/docs/plugins/agents.md` documents the new `ephemeral` frontmatter key alongside the other AgentDefinition knobs. Signed-off-by: MarioCadenas <[email protected]>

…template Final layer of the agents feature stack. Everything needed to exercise, demonstrate, and learn the feature. ### Reference application: agent-app `apps/agent-app/` — a standalone app purpose-built around the agents feature. Ships with: - `server.ts` — full example of code-defined agents via `fromPlugin`: ```ts const support = createAgent({ instructions: "…", tools: { ...fromPlugin(analytics), ...fromPlugin(files), get_weather, "mcp.vector-search": mcpServer("vector-search", "https://…"), }, }); await createApp({ plugins: [server({ port }), analytics(), files(), agents({ agents: { support } })], }); ``` - `config/agents/assistant.md` — markdown-driven agent alongside the code-defined one, showing the asymmetric auto-inherit default. - Vite + React 19 + TailwindCSS frontend with a chat UI. - Databricks deployment config (`databricks.yml`, `app.yaml`) and deploy scripts. ### dev-playground chat UI + demo agent `apps/dev-playground/client/src/routes/agent.route.tsx` — chat UI with inline autocomplete (hits the `autocomplete` markdown agent) and a full threaded conversation panel (hits the default agent). `apps/dev-playground/server/index.ts` — adds a code-defined `helper` agent using `fromPlugin(analytics)` alongside the markdown-driven `autocomplete` agent in `config/agents/`. Exercises the mixed-style setup (markdown + code) against the same plugin list. `apps/dev-playground/config/agents/*.md` — both agents defined with valid YAML frontmatter. ### Docs `docs/docs/plugins/agents.md` — progressive five-level guide: 1. Drop a markdown file → it just works. 2. Scope tools via `toolkits:` / `tools:` frontmatter. 3. Code-defined agents with `fromPlugin()`. 4. Sub-agents. 5. Standalone `runAgent()` (no `createApp` or HTTP). Plus a configuration reference, runtime API reference, and frontmatter schema table. `docs/docs/api/appkit/` — regenerated typedoc for the new public surface (fromPlugin, runAgent, AgentDefinition, AgentsPluginConfig, ToolkitEntry, ToolkitOptions, all adapter types, and the agents plugin factory). ### Template `template/appkit.plugins.json` — adds the `agent` plugin entry so `npx @databricks/appkit init --features agent` scaffolds the plugin correctly. ### Test plan - Full appkit vitest suite: 1311 tests passing - Typecheck clean across all 8 workspace projects - `pnpm docs:build` clean (no broken links) - `pnpm --filter=@databricks/appkit build:package` clean, publint clean Signed-off-by: MarioCadenas <[email protected]> ### Zero-trust MCP host policy documentation (S1 security) Documents the new `mcp` configuration block and the rules it enforces: same-origin-only by default, explicit `trustedHosts` for external MCP servers, plaintext `http://` refused outside localhost-in-dev, and DNS-level blocking of private / link-local IP ranges (covers cloud metadata services). See PR #302 for the policy implementation and PR #304 for the `AgentsPluginConfig.mcp` wiring. Signed-off-by: MarioCadenas <[email protected]> ### HITL approval UI + SQL safety docs (S2 security, Layer 1-3) - `docs/docs/plugins/agents.md`: new "SQL agent tools" subsection covering `analytics.query` readOnly enforcement, `lakebase.query` opt-in via `exposeAsAgentTool`, and the approval flow. New "Human-in-the-loop approval for destructive tools" subsection documents the config, SSE event shape, and `POST /chat/approve` contract. - `apps/agent-app`: approval-card component rendered inline in the chat stream whenever an `appkit.approval_pending` event arrives. Destructive badge + Approve/Deny buttons POST to `/api/agent/approve` with the carried `streamId`/`approvalId`. - `apps/dev-playground/client`: matching approval-card on the agent route, using the existing appkit-ui `Button` component and Tailwind utility classes. Signed-off-by: MarioCadenas <[email protected]> ### Auto-inherit posture documentation (S3 security, Layer 3) Updates `docs/docs/plugins/agents.md` to document the new two-key auto-inherit model introduced in PR #302 (per-tool `autoInheritable` flag) and PR #304 (safe-by-default `autoInheritTools: { file: false, code: false }`). Adds an "Auto-inherit posture" subsection explaining that the developer must opt into `autoInheritTools` AND the plugin author must mark each tool `autoInheritable: true` for a tool to spread without explicit wiring. Includes a table documenting the `autoInheritable` marking on each core plugin tool, plus an example of the setup-time audit log so operators can see exactly what's inherited vs. skipped. Signed-off-by: MarioCadenas <[email protected]> ### MVP polish - **Reference app no longer ships hardcoded dogfood URLs.** The three `https://e2-dogfood.staging.cloud.databricks.com/...` and `https://mario-mcp-hello-*.staging.aws.databricksapps.com/...` MCP URLs in `apps/agent-app/server.ts` are replaced with optional env-driven `VECTOR_SEARCH_MCP_URL` / `CUSTOM_MCP_URL` config. When set, their hostnames are auto-added to `agents({ mcp: { trustedHosts } })`. `.env.example` uses placeholder values the reader can replace instead of another team's workspace. - **`appkit.agent` → `appkit.agents` in the reference app.** The prior `appkit.agent as { list, getDefault }` cast papered over the plugin-name mismatch fixed in PR #304. The runtime key now matches the docs, the manifest, and the factory name; the cast is gone. - **Auto-inherit opt-in added to the reference config.** Since the defaults flipped to `{ file: false, code: false }` (PR #304, S-3), the reference now explicitly enables `autoInheritTools: { file: true }` so the markdown agents that ship alongside the code-defined one still pick up the analytics / files read-only tools. This is the pattern a real deployment should follow — opt in deliberately. Signed-off-by: MarioCadenas <[email protected]> ### Ephemeral autocomplete agent + docs - `apps/dev-playground/config/agents/autocomplete.md` sets `ephemeral: true`. Each debounced autocomplete keystroke no longer leaves an orphan thread in `InMemoryThreadStore` — the server now deletes the thread in the stream's `finally` (PR #304). Closes R1 from the MVP re-review. - `docs/docs/plugins/agents.md` documents the new `ephemeral` frontmatter key alongside the other AgentDefinition knobs. Signed-off-by: MarioCadenas <[email protected]> ### DoS limits documentation Documents the MVP resource caps landed in PR #304: the static request-body caps (enforced by the Zod schemas) and the three configurable runtime limits (`maxConcurrentStreamsPerUser`, `maxToolCalls`, `maxSubAgentDepth`). Includes the config-block shape in the main reference and a new "Resource limits" subsection under the Configuration section explaining the intent and per-user semantics of each cap. Signed-off-by: MarioCadenas <[email protected]>

Second layer of the agents feature. Adds the primitives for defining agent tools and implements them on every core ToolProvider plugin. - `tool(config)` — inline function tools backed by a Zod schema. Auto- generates JSON Schema for the LLM via `z.toJSONSchema()` (stripping the top-level `$schema` annotation that Gemini rejects), runtime- validates tool-call arguments, returns an LLM-friendly error string on validation failure so the model can self-correct. - `mcpServer(name, url)` — tiny factory for hosted custom MCP server configs. Replaces the verbose `{ type: "custom_mcp_server", custom_mcp_server: { app_name, app_url } }` wrapper. - `FunctionTool` / `HostedTool` types + `isFunctionTool` / `isHostedTool` type guards. `HostedTool` is a union of Genie, VectorSearch, custom MCP, and external-connection configs. - `ToolkitEntry` + `ToolkitOptions` types + `isToolkitEntry` guard. `AgentTool = FunctionTool | HostedTool | ToolkitEntry` is the canonical union later PRs spread into agent definitions. - `defineTool(config)` + `ToolRegistry` — plugin authors' internal shape for declaring a keyed set of tools with Zod-typed handlers. - `toolsFromRegistry()` — produces the `AgentToolDefinition[]` exposed via `ToolProvider.getAgentTools()`. - `executeFromRegistry()` — validates args then dispatches to the handler. Returns LLM-friendly errors on bad args. - `toToolJSONSchema()` — shared helper at `packages/appkit/src/plugins/agents/tools/json-schema.ts` that wraps `toJSONSchema()` and strips `$schema`. Used by `tool()`, `toolsFromRegistry()`, and `buildToolkitEntries()`. - `buildToolkitEntries(pluginName, registry, opts?)` — converts a plugin's internal `ToolRegistry` into a keyed record of `ToolkitEntry` markers, honoring `prefix` / `only` / `except` / `rename`. - `AppKitMcpClient` — minimal JSON-RPC 2.0 client over SSE, zero deps. Handles auth refresh, per-server connection pooling, and tool definition aggregation. - `resolveHostedTools()` — maps `HostedTool` configs to Databricks MCP endpoint URLs. - **analytics** — `query` tool (Zod-typed, asUser dispatch) - **files** — per-volume tool family: `${volumeKey}.{list,read,exists,metadata,upload,delete}` (dynamically named from the plugin's volume config) - **genie** — per-space tool family: `${alias}.{sendMessage,getConversation}` (dynamically named from the plugin's spaces config) - **lakebase** — `query` tool Each plugin gains `getAgentTools()` + `executeAgentTool()` satisfying the `ToolProvider` interface, plus a `.toolkit(opts?)` method that returns a record of `ToolkitEntry` markers for later spread into agent definitions. - 58 new tests across tool primitives + plugin ToolProvider surfaces - Full appkit vitest suite: 1212 tests passing - Typecheck clean - Build clean, publint clean Signed-off-by: MarioCadenas <[email protected]> New `mcp-host-policy.ts` module enforces an allowlist on every MCP URL before the first byte is sent. Same-origin Databricks workspace URLs are admitted by default; any other host must be explicitly trusted via the new `AgentsPluginConfig.mcp.trustedHosts` field (added in a subsequent stack layer). - Rejects non-`http(s)` schemes and plaintext `http://` outside of localhost-in-dev. - Blocks link-local (`169.254/16` — cloud metadata), RFC1918, CGNAT, loopback (unless `allowLocalhost`), ULA, multicast, and IPv4-mapped IPv6 equivalents at DNS-resolve time. IP-literal URLs in these ranges are rejected without a DNS lookup. Malformed IPs fail-closed. - `AppKitMcpClient` constructor now takes the policy as a third arg. Workspace credentials (SP on `initialize`/`tools/list`; caller- supplied OBO on `tools/call`) are never attached to non-workspace hosts — `callTool` drops caller OBO overrides for external destinations, and `sendRpc`/`sendNotification` never invoke `authenticate()` when `forwardWorkspaceAuth` is false. - Constructor accepts optional `{ dnsLookup, fetchImpl }` for test DI. New tests: - `mcp-host-policy.test.ts` (42 tests): config builder, URL check, IP blocklist, DNS-backed resolution with split-DNS defense. - `mcp-client.test.ts` (8 tests): integrated client with recording fetch — verifies no fetch + no `authenticate()` call when URL is rejected, and that auth headers are scoped correctly per-destination. - `json-schema.ts`: biome formatting fix (pre-existing drift). - `packages/appkit/src/index.ts`: biome organizeImports fix (pre-existing sort order drift). Full appkit vitest suite: 1262 tests passing (+50 from security). Signed-off-by: MarioCadenas <[email protected]> New `sql-policy.ts` module provides `classifyReadOnly(sql)` and `assertReadOnlySql(sql)` — a dependency-free tokenizer-based classifier that rejects any statement outside `SELECT | WITH | SHOW | EXPLAIN | DESCRIBE` at execution time. Also exports `wrapInReadOnlyTransaction(stmt)` which produces a `BEGIN READ ONLY … ROLLBACK` envelope for belt-and-suspenders enforcement on PostgreSQL. Why a hand-rolled tokenizer rather than `node-sql-parser` or `pgsql-parser`: - `node-sql-parser`'s Hive/Spark dialect coverage rejects common Databricks SQL patterns (three-part names, `SHOW TABLES IN`, `DESCRIBE EXTENDED`, `EXPLAIN`); its PostgreSQL grammar rejects the same meta-commands. - `pgsql-parser` (libpg_query) is a native binding and fails to install cleanly on Databricks App runtimes. - We only need statement-type classification, not full parsing. The tokenizer handles line/block comments (nested), single- and double-quoted literals, ANSI/backtick identifiers, PostgreSQL dollar-quoted strings, `E'..'` escape strings, and reports unterminated literals as fail-closed. 62 tests exercise evasion vectors (stacked writes, quoted keywords, comment-hidden writes, mismatched dollar-quote tags, unterminated strings). `analytics.query` was annotated `{ readOnly: true, requiresUserContext: true }` but the annotation was a claim only. A prompt-injected LLM could send `UPDATE`, `DELETE`, or `DROP` and the warehouse would run it subject to the end user's SQL grants. The tool now calls `assertReadOnlySql` before reaching `this.query()`. A rejection surfaces an LLM-friendly error the model can self-correct on; tests verify writes never reach the warehouse. Public `AppKit.analytics.query(...)` continues to accept arbitrary SQL — app authors use it intentionally; LLMs do not. `lakebase.query` previously shipped as an always-on agent tool with `{ readOnly: false, destructive: false, idempotent: false }` (`destructive: false` was an outright lie) and executed arbitrary LLM SQL against the SP-scoped pool, auto-inherited by every markdown agent. The plugin now registers **no** agent tool by default. Opt-in via: ```ts lakebase({ exposeAsAgentTool: { iUnderstandRunsAsServicePrincipal: true, readOnly: true, // default }, }); ``` The acknowledgement flag is required because the pool is bound to the service principal regardless of which end user invokes the tool — enabling the tool is a deliberate privilege grant. When opted in with `readOnly: true` (default): - Statement classified by `classifyReadOnly` (rejects non-SELECT with an LLM-friendly error). - Remaining statement executed inside `BEGIN READ ONLY; …; ROLLBACK` so PostgreSQL enforces server-side even if a side-effecting function slips past the classifier. - Annotations: `{ readOnly: true, destructive: false, idempotent: false }`. When opted in with `readOnly: false`: - Statement passed through unchanged. - Annotations: `{ readOnly: false, destructive: true, idempotent: false }`. The `destructive: true` signal will be honored by the agents plugin's HITL approval gate in PR #304. `LakebasePlugin` is now `export class` so tests can construct it directly. New test file `lakebase-agent-tool.test.ts` (9 tests) verifies defaults, opt-in, acknowledgement enforcement, readOnly rejection + wrap, and destructive pass-through. Full appkit vitest suite: 1340 tests passing (+78 from S-2 Layer 1+2). Signed-off-by: MarioCadenas <[email protected]> Groundwork for flipping the unsafe `autoInheritTools: { file: true }` default into opt-in auto-inherit gated by per-tool consent. Adds an `autoInheritable?: boolean` field to `ToolEntry` (defined via `defineTool`) and propagates it through `buildToolkitEntries` onto the resulting `ToolkitEntry`. The agents plugin consumes this flag in a subsequent stack layer to filter auto-inherited tools — any tool whose plugin author has not explicitly marked `autoInheritable: true` never spreads into agents that enable auto-inherit. Default is `false` for defense-in-depth. Plugin authors must consciously mark tools that are genuinely safe for unscoped exposure. - `analytics.query`: `autoInheritable: true`. The tool is OBO-scoped and `readOnly: true` is enforced at runtime (S2). - `files.list` / `files.read` / `files.exists` / `files.metadata`: `autoInheritable: true`. Pure read operations, OBO-scoped. - `files.upload` / `files.delete`: NOT auto-inheritable. Destructive; must be explicitly wired by the agent author. - `genie.getConversation`: `autoInheritable: true` (read-only history). - `genie.sendMessage`: NOT auto-inheritable. State-mutating Genie conversation; user wires explicitly if desired. - `lakebase.query`: NOT auto-inheritable. The tool is already gated by the explicit `exposeAsAgentTool` acknowledgement (S2); auto-inherit remains closed as defense-in-depth. - `build-toolkit.test.ts` gains a case verifying propagation: explicit `true`, explicit `false`, and omitted (undefined) all flow through to the `ToolkitEntry` unchanged. Signed-off-by: MarioCadenas <[email protected]> - **MCP caller abort signal composition.** `callTool` now accepts an optional `callerSignal` and composes it with the existing 30 s timeout via `AbortSignal.any([...])`. The agents plugin threads its stream signal through in a subsequent stack layer so that a `POST /cancel` or agent-run shutdown immediately propagates to the MCP fetch, rather than leaving in-flight MCP calls running on the remote server until they complete. Adds a regression test that aborts the caller signal mid-fetch and asserts the fetch rejects with `AbortError`. Signed-off-by: MarioCadenas <[email protected]> Three issues flagged by the re-review pass (one correctness HIGH, two security MEDIUMs). - **Lakebase read-only path no longer emits a multi-statement batch.** `wrapInReadOnlyTransaction` returned `"BEGIN READ ONLY;\n<stmt>;\ nROLLBACK;"` as one string passed to `pool.query(text, values)`. As soon as the agent supplied `values`, pg switched to the Extended Query protocol and PostgreSQL rejected the batch with `cannot insert multiple commands into a prepared statement`, silently breaking every parameterized read-only tool call in production. The mocked lakebase test concealed this. The helper is removed; `LakebasePlugin.runReadOnlyStatement` now acquires a dedicated client from the pool and runs three separate `client.query` calls on the same connection (`BEGIN READ ONLY`, user statement with values, `ROLLBACK`), with a `finally` that rolls back and releases the client even when the user statement throws. Four tests cover the new flow: dispatch-time rejection, statement ordering, parameter forwarding, and release-on-error. - **`isBlockedIpv6` link-local `fe80::/10` now covers the full range.** Previous regex `startsWith("fe80:") || startsWith("fe9")` only matched `fe80`–`fe9f`, leaving `fea0`–`febf` (valid link-local) passable. Replaced with `/^fe[89ab][0-9a-f]:/.test(lowered)` so the second hex nibble is checked against `8..b`. - **`::ffff:<hex>:<hex>` IPv4-mapped IPv6 is now normalised.** The colon-hex form of an IPv4-mapped address (`::ffff:a9fe:a9fe` = 169.254.169.254) previously bypassed the IPv4 blocklist because `isIPv4("a9fe:a9fe")` is false. `hexPairToDottedIpv4` reassembles the trailing two hex groups into dotted form and delegates to `isBlockedIpv4`. Regression tests cover the metadata, 10/8, and 192.168/16 equivalents; a public IPv4 mapped to colon-hex still passes through. Signed-off-by: MarioCadenas <[email protected]>

…template Final layer of the agents feature stack. Everything needed to exercise, demonstrate, and learn the feature. `apps/agent-app/` — a standalone app purpose-built around the agents feature. Ships with: - `server.ts` — full example of code-defined agents via `fromPlugin`: ```ts const support = createAgent({ instructions: "…", tools: { ...fromPlugin(analytics), ...fromPlugin(files), get_weather, "mcp.vector-search": mcpServer("vector-search", "https://…"), }, }); await createApp({ plugins: [server({ port }), analytics(), files(), agents({ agents: { support } })], }); ``` - `config/agents/assistant.md` — markdown-driven agent alongside the code-defined one, showing the asymmetric auto-inherit default. - Vite + React 19 + TailwindCSS frontend with a chat UI. - Databricks deployment config (`databricks.yml`, `app.yaml`) and deploy scripts. `apps/dev-playground/client/src/routes/agent.route.tsx` — chat UI with inline autocomplete (hits the `autocomplete` markdown agent) and a full threaded conversation panel (hits the default agent). `apps/dev-playground/server/index.ts` — adds a code-defined `helper` agent using `fromPlugin(analytics)` alongside the markdown-driven `autocomplete` agent in `config/agents/`. Exercises the mixed-style setup (markdown + code) against the same plugin list. `apps/dev-playground/config/agents/*.md` — both agents defined with valid YAML frontmatter. `docs/docs/plugins/agents.md` — progressive five-level guide: 1. Drop a markdown file → it just works. 2. Scope tools via `toolkits:` / `tools:` frontmatter. 3. Code-defined agents with `fromPlugin()`. 4. Sub-agents. 5. Standalone `runAgent()` (no `createApp` or HTTP). Plus a configuration reference, runtime API reference, and frontmatter schema table. `docs/docs/api/appkit/` — regenerated typedoc for the new public surface (fromPlugin, runAgent, AgentDefinition, AgentsPluginConfig, ToolkitEntry, ToolkitOptions, all adapter types, and the agents plugin factory). `template/appkit.plugins.json` — adds the `agent` plugin entry so `npx @databricks/appkit init --features agent` scaffolds the plugin correctly. - Full appkit vitest suite: 1311 tests passing - Typecheck clean across all 8 workspace projects - `pnpm docs:build` clean (no broken links) - `pnpm --filter=@databricks/appkit build:package` clean, publint clean Signed-off-by: MarioCadenas <[email protected]> Documents the new `mcp` configuration block and the rules it enforces: same-origin-only by default, explicit `trustedHosts` for external MCP servers, plaintext `http://` refused outside localhost-in-dev, and DNS-level blocking of private / link-local IP ranges (covers cloud metadata services). See PR #302 for the policy implementation and PR #304 for the `AgentsPluginConfig.mcp` wiring. Signed-off-by: MarioCadenas <[email protected]> - `docs/docs/plugins/agents.md`: new "SQL agent tools" subsection covering `analytics.query` readOnly enforcement, `lakebase.query` opt-in via `exposeAsAgentTool`, and the approval flow. New "Human-in-the-loop approval for destructive tools" subsection documents the config, SSE event shape, and `POST /chat/approve` contract. - `apps/agent-app`: approval-card component rendered inline in the chat stream whenever an `appkit.approval_pending` event arrives. Destructive badge + Approve/Deny buttons POST to `/api/agent/approve` with the carried `streamId`/`approvalId`. - `apps/dev-playground/client`: matching approval-card on the agent route, using the existing appkit-ui `Button` component and Tailwind utility classes. Signed-off-by: MarioCadenas <[email protected]> Updates `docs/docs/plugins/agents.md` to document the new two-key auto-inherit model introduced in PR #302 (per-tool `autoInheritable` flag) and PR #304 (safe-by-default `autoInheritTools: { file: false, code: false }`). Adds an "Auto-inherit posture" subsection explaining that the developer must opt into `autoInheritTools` AND the plugin author must mark each tool `autoInheritable: true` for a tool to spread without explicit wiring. Includes a table documenting the `autoInheritable` marking on each core plugin tool, plus an example of the setup-time audit log so operators can see exactly what's inherited vs. skipped. Signed-off-by: MarioCadenas <[email protected]> - **Reference app no longer ships hardcoded dogfood URLs.** The three `https://e2-dogfood.staging.cloud.databricks.com/...` and `https://mario-mcp-hello-*.staging.aws.databricksapps.com/...` MCP URLs in `apps/agent-app/server.ts` are replaced with optional env-driven `VECTOR_SEARCH_MCP_URL` / `CUSTOM_MCP_URL` config. When set, their hostnames are auto-added to `agents({ mcp: { trustedHosts } })`. `.env.example` uses placeholder values the reader can replace instead of another team's workspace. - **`appkit.agent` → `appkit.agents` in the reference app.** The prior `appkit.agent as { list, getDefault }` cast papered over the plugin-name mismatch fixed in PR #304. The runtime key now matches the docs, the manifest, and the factory name; the cast is gone. - **Auto-inherit opt-in added to the reference config.** Since the defaults flipped to `{ file: false, code: false }` (PR #304, S-3), the reference now explicitly enables `autoInheritTools: { file: true }` so the markdown agents that ship alongside the code-defined one still pick up the analytics / files read-only tools. This is the pattern a real deployment should follow — opt in deliberately. Signed-off-by: MarioCadenas <[email protected]> - `apps/dev-playground/config/agents/autocomplete.md` sets `ephemeral: true`. Each debounced autocomplete keystroke no longer leaves an orphan thread in `InMemoryThreadStore` — the server now deletes the thread in the stream's `finally` (PR #304). Closes R1 from the MVP re-review. - `docs/docs/plugins/agents.md` documents the new `ephemeral` frontmatter key alongside the other AgentDefinition knobs. Signed-off-by: MarioCadenas <[email protected]> Documents the MVP resource caps landed in PR #304: the static request-body caps (enforced by the Zod schemas) and the three configurable runtime limits (`maxConcurrentStreamsPerUser`, `maxToolCalls`, `maxSubAgentDepth`). Includes the config-block shape in the main reference and a new "Resource limits" subsection under the Configuration section explaining the intent and per-user semantics of each cap. Signed-off-by: MarioCadenas <[email protected]>

Second layer of the agents feature. Adds the primitives for defining agent tools and implements them on every core ToolProvider plugin. - `tool(config)` — inline function tools backed by a Zod schema. Auto- generates JSON Schema for the LLM via `z.toJSONSchema()` (stripping the top-level `$schema` annotation that Gemini rejects), runtime- validates tool-call arguments, returns an LLM-friendly error string on validation failure so the model can self-correct. - `mcpServer(name, url)` — tiny factory for hosted custom MCP server configs. Replaces the verbose `{ type: "custom_mcp_server", custom_mcp_server: { app_name, app_url } }` wrapper. - `FunctionTool` / `HostedTool` types + `isFunctionTool` / `isHostedTool` type guards. `HostedTool` is a union of Genie, VectorSearch, custom MCP, and external-connection configs. - `ToolkitEntry` + `ToolkitOptions` types + `isToolkitEntry` guard. `AgentTool = FunctionTool | HostedTool | ToolkitEntry` is the canonical union later PRs spread into agent definitions. - `defineTool(config)` + `ToolRegistry` — plugin authors' internal shape for declaring a keyed set of tools with Zod-typed handlers. - `toolsFromRegistry()` — produces the `AgentToolDefinition[]` exposed via `ToolProvider.getAgentTools()`. - `executeFromRegistry()` — validates args then dispatches to the handler. Returns LLM-friendly errors on bad args. - `toToolJSONSchema()` — shared helper at `packages/appkit/src/plugins/agents/tools/json-schema.ts` that wraps `toJSONSchema()` and strips `$schema`. Used by `tool()`, `toolsFromRegistry()`, and `buildToolkitEntries()`. - `buildToolkitEntries(pluginName, registry, opts?)` — converts a plugin's internal `ToolRegistry` into a keyed record of `ToolkitEntry` markers, honoring `prefix` / `only` / `except` / `rename`. - `AppKitMcpClient` — minimal JSON-RPC 2.0 client over SSE, zero deps. Handles auth refresh, per-server connection pooling, and tool definition aggregation. - `resolveHostedTools()` — maps `HostedTool` configs to Databricks MCP endpoint URLs. - **analytics** — `query` tool (Zod-typed, asUser dispatch) - **files** — per-volume tool family: `${volumeKey}.{list,read,exists,metadata,upload,delete}` (dynamically named from the plugin's volume config) - **genie** — per-space tool family: `${alias}.{sendMessage,getConversation}` (dynamically named from the plugin's spaces config) - **lakebase** — `query` tool Each plugin gains `getAgentTools()` + `executeAgentTool()` satisfying the `ToolProvider` interface, plus a `.toolkit(opts?)` method that returns a record of `ToolkitEntry` markers for later spread into agent definitions. - 58 new tests across tool primitives + plugin ToolProvider surfaces - Full appkit vitest suite: 1212 tests passing - Typecheck clean - Build clean, publint clean Signed-off-by: MarioCadenas <[email protected]> New `mcp-host-policy.ts` module enforces an allowlist on every MCP URL before the first byte is sent. Same-origin Databricks workspace URLs are admitted by default; any other host must be explicitly trusted via the new `AgentsPluginConfig.mcp.trustedHosts` field (added in a subsequent stack layer). - Rejects non-`http(s)` schemes and plaintext `http://` outside of localhost-in-dev. - Blocks link-local (`169.254/16` — cloud metadata), RFC1918, CGNAT, loopback (unless `allowLocalhost`), ULA, multicast, and IPv4-mapped IPv6 equivalents at DNS-resolve time. IP-literal URLs in these ranges are rejected without a DNS lookup. Malformed IPs fail-closed. - `AppKitMcpClient` constructor now takes the policy as a third arg. Workspace credentials (SP on `initialize`/`tools/list`; caller- supplied OBO on `tools/call`) are never attached to non-workspace hosts — `callTool` drops caller OBO overrides for external destinations, and `sendRpc`/`sendNotification` never invoke `authenticate()` when `forwardWorkspaceAuth` is false. - Constructor accepts optional `{ dnsLookup, fetchImpl }` for test DI. New tests: - `mcp-host-policy.test.ts` (42 tests): config builder, URL check, IP blocklist, DNS-backed resolution with split-DNS defense. - `mcp-client.test.ts` (8 tests): integrated client with recording fetch — verifies no fetch + no `authenticate()` call when URL is rejected, and that auth headers are scoped correctly per-destination. - `json-schema.ts`: biome formatting fix (pre-existing drift). - `packages/appkit/src/index.ts`: biome organizeImports fix (pre-existing sort order drift). Full appkit vitest suite: 1262 tests passing (+50 from security). Signed-off-by: MarioCadenas <[email protected]> New `sql-policy.ts` module provides `classifyReadOnly(sql)` and `assertReadOnlySql(sql)` — a dependency-free tokenizer-based classifier that rejects any statement outside `SELECT | WITH | SHOW | EXPLAIN | DESCRIBE` at execution time. Also exports `wrapInReadOnlyTransaction(stmt)` which produces a `BEGIN READ ONLY … ROLLBACK` envelope for belt-and-suspenders enforcement on PostgreSQL. Why a hand-rolled tokenizer rather than `node-sql-parser` or `pgsql-parser`: - `node-sql-parser`'s Hive/Spark dialect coverage rejects common Databricks SQL patterns (three-part names, `SHOW TABLES IN`, `DESCRIBE EXTENDED`, `EXPLAIN`); its PostgreSQL grammar rejects the same meta-commands. - `pgsql-parser` (libpg_query) is a native binding and fails to install cleanly on Databricks App runtimes. - We only need statement-type classification, not full parsing. The tokenizer handles line/block comments (nested), single- and double-quoted literals, ANSI/backtick identifiers, PostgreSQL dollar-quoted strings, `E'..'` escape strings, and reports unterminated literals as fail-closed. 62 tests exercise evasion vectors (stacked writes, quoted keywords, comment-hidden writes, mismatched dollar-quote tags, unterminated strings). `analytics.query` was annotated `{ readOnly: true, requiresUserContext: true }` but the annotation was a claim only. A prompt-injected LLM could send `UPDATE`, `DELETE`, or `DROP` and the warehouse would run it subject to the end user's SQL grants. The tool now calls `assertReadOnlySql` before reaching `this.query()`. A rejection surfaces an LLM-friendly error the model can self-correct on; tests verify writes never reach the warehouse. Public `AppKit.analytics.query(...)` continues to accept arbitrary SQL — app authors use it intentionally; LLMs do not. `lakebase.query` previously shipped as an always-on agent tool with `{ readOnly: false, destructive: false, idempotent: false }` (`destructive: false` was an outright lie) and executed arbitrary LLM SQL against the SP-scoped pool, auto-inherited by every markdown agent. The plugin now registers **no** agent tool by default. Opt-in via: ```ts lakebase({ exposeAsAgentTool: { iUnderstandRunsAsServicePrincipal: true, readOnly: true, // default }, }); ``` The acknowledgement flag is required because the pool is bound to the service principal regardless of which end user invokes the tool — enabling the tool is a deliberate privilege grant. When opted in with `readOnly: true` (default): - Statement classified by `classifyReadOnly` (rejects non-SELECT with an LLM-friendly error). - Remaining statement executed inside `BEGIN READ ONLY; …; ROLLBACK` so PostgreSQL enforces server-side even if a side-effecting function slips past the classifier. - Annotations: `{ readOnly: true, destructive: false, idempotent: false }`. When opted in with `readOnly: false`: - Statement passed through unchanged. - Annotations: `{ readOnly: false, destructive: true, idempotent: false }`. The `destructive: true` signal will be honored by the agents plugin's HITL approval gate in PR #304. `LakebasePlugin` is now `export class` so tests can construct it directly. New test file `lakebase-agent-tool.test.ts` (9 tests) verifies defaults, opt-in, acknowledgement enforcement, readOnly rejection + wrap, and destructive pass-through. Full appkit vitest suite: 1340 tests passing (+78 from S-2 Layer 1+2). Signed-off-by: MarioCadenas <[email protected]> Groundwork for flipping the unsafe `autoInheritTools: { file: true }` default into opt-in auto-inherit gated by per-tool consent. Adds an `autoInheritable?: boolean` field to `ToolEntry` (defined via `defineTool`) and propagates it through `buildToolkitEntries` onto the resulting `ToolkitEntry`. The agents plugin consumes this flag in a subsequent stack layer to filter auto-inherited tools — any tool whose plugin author has not explicitly marked `autoInheritable: true` never spreads into agents that enable auto-inherit. Default is `false` for defense-in-depth. Plugin authors must consciously mark tools that are genuinely safe for unscoped exposure. - `analytics.query`: `autoInheritable: true`. The tool is OBO-scoped and `readOnly: true` is enforced at runtime (S2). - `files.list` / `files.read` / `files.exists` / `files.metadata`: `autoInheritable: true`. Pure read operations, OBO-scoped. - `files.upload` / `files.delete`: NOT auto-inheritable. Destructive; must be explicitly wired by the agent author. - `genie.getConversation`: `autoInheritable: true` (read-only history). - `genie.sendMessage`: NOT auto-inheritable. State-mutating Genie conversation; user wires explicitly if desired. - `lakebase.query`: NOT auto-inheritable. The tool is already gated by the explicit `exposeAsAgentTool` acknowledgement (S2); auto-inherit remains closed as defense-in-depth. - `build-toolkit.test.ts` gains a case verifying propagation: explicit `true`, explicit `false`, and omitted (undefined) all flow through to the `ToolkitEntry` unchanged. Signed-off-by: MarioCadenas <[email protected]> - **MCP caller abort signal composition.** `callTool` now accepts an optional `callerSignal` and composes it with the existing 30 s timeout via `AbortSignal.any([...])`. The agents plugin threads its stream signal through in a subsequent stack layer so that a `POST /cancel` or agent-run shutdown immediately propagates to the MCP fetch, rather than leaving in-flight MCP calls running on the remote server until they complete. Adds a regression test that aborts the caller signal mid-fetch and asserts the fetch rejects with `AbortError`. Signed-off-by: MarioCadenas <[email protected]> Three issues flagged by the re-review pass (one correctness HIGH, two security MEDIUMs). - **Lakebase read-only path no longer emits a multi-statement batch.** `wrapInReadOnlyTransaction` returned `"BEGIN READ ONLY;\n<stmt>;\ nROLLBACK;"` as one string passed to `pool.query(text, values)`. As soon as the agent supplied `values`, pg switched to the Extended Query protocol and PostgreSQL rejected the batch with `cannot insert multiple commands into a prepared statement`, silently breaking every parameterized read-only tool call in production. The mocked lakebase test concealed this. The helper is removed; `LakebasePlugin.runReadOnlyStatement` now acquires a dedicated client from the pool and runs three separate `client.query` calls on the same connection (`BEGIN READ ONLY`, user statement with values, `ROLLBACK`), with a `finally` that rolls back and releases the client even when the user statement throws. Four tests cover the new flow: dispatch-time rejection, statement ordering, parameter forwarding, and release-on-error. - **`isBlockedIpv6` link-local `fe80::/10` now covers the full range.** Previous regex `startsWith("fe80:") || startsWith("fe9")` only matched `fe80`–`fe9f`, leaving `fea0`–`febf` (valid link-local) passable. Replaced with `/^fe[89ab][0-9a-f]:/.test(lowered)` so the second hex nibble is checked against `8..b`. - **`::ffff:<hex>:<hex>` IPv4-mapped IPv6 is now normalised.** The colon-hex form of an IPv4-mapped address (`::ffff:a9fe:a9fe` = 169.254.169.254) previously bypassed the IPv4 blocklist because `isIPv4("a9fe:a9fe")` is false. `hexPairToDottedIpv4` reassembles the trailing two hex groups into dotted form and delegates to `isBlockedIpv4`. Regression tests cover the metadata, 10/8, and 192.168/16 equivalents; a public IPv4 mapped to colon-hex still passes through. Signed-off-by: MarioCadenas <[email protected]>

…template Final layer of the agents feature stack. Everything needed to exercise, demonstrate, and learn the feature. `apps/agent-app/` — a standalone app purpose-built around the agents feature. Ships with: - `server.ts` — full example of code-defined agents via `fromPlugin`: ```ts const support = createAgent({ instructions: "…", tools: { ...fromPlugin(analytics), ...fromPlugin(files), get_weather, "mcp.vector-search": mcpServer("vector-search", "https://…"), }, }); await createApp({ plugins: [server({ port }), analytics(), files(), agents({ agents: { support } })], }); ``` - `config/agents/assistant.md` — markdown-driven agent alongside the code-defined one, showing the asymmetric auto-inherit default. - Vite + React 19 + TailwindCSS frontend with a chat UI. - Databricks deployment config (`databricks.yml`, `app.yaml`) and deploy scripts. `apps/dev-playground/client/src/routes/agent.route.tsx` — chat UI with inline autocomplete (hits the `autocomplete` markdown agent) and a full threaded conversation panel (hits the default agent). `apps/dev-playground/server/index.ts` — adds a code-defined `helper` agent using `fromPlugin(analytics)` alongside the markdown-driven `autocomplete` agent in `config/agents/`. Exercises the mixed-style setup (markdown + code) against the same plugin list. `apps/dev-playground/config/agents/*.md` — both agents defined with valid YAML frontmatter. `docs/docs/plugins/agents.md` — progressive five-level guide: 1. Drop a markdown file → it just works. 2. Scope tools via `toolkits:` / `tools:` frontmatter. 3. Code-defined agents with `fromPlugin()`. 4. Sub-agents. 5. Standalone `runAgent()` (no `createApp` or HTTP). Plus a configuration reference, runtime API reference, and frontmatter schema table. `docs/docs/api/appkit/` — regenerated typedoc for the new public surface (fromPlugin, runAgent, AgentDefinition, AgentsPluginConfig, ToolkitEntry, ToolkitOptions, all adapter types, and the agents plugin factory). `template/appkit.plugins.json` — adds the `agent` plugin entry so `npx @databricks/appkit init --features agent` scaffolds the plugin correctly. - Full appkit vitest suite: 1311 tests passing - Typecheck clean across all 8 workspace projects - `pnpm docs:build` clean (no broken links) - `pnpm --filter=@databricks/appkit build:package` clean, publint clean Signed-off-by: MarioCadenas <[email protected]> Documents the new `mcp` configuration block and the rules it enforces: same-origin-only by default, explicit `trustedHosts` for external MCP servers, plaintext `http://` refused outside localhost-in-dev, and DNS-level blocking of private / link-local IP ranges (covers cloud metadata services). See PR #302 for the policy implementation and PR #304 for the `AgentsPluginConfig.mcp` wiring. Signed-off-by: MarioCadenas <[email protected]> - `docs/docs/plugins/agents.md`: new "SQL agent tools" subsection covering `analytics.query` readOnly enforcement, `lakebase.query` opt-in via `exposeAsAgentTool`, and the approval flow. New "Human-in-the-loop approval for destructive tools" subsection documents the config, SSE event shape, and `POST /chat/approve` contract. - `apps/agent-app`: approval-card component rendered inline in the chat stream whenever an `appkit.approval_pending` event arrives. Destructive badge + Approve/Deny buttons POST to `/api/agent/approve` with the carried `streamId`/`approvalId`. - `apps/dev-playground/client`: matching approval-card on the agent route, using the existing appkit-ui `Button` component and Tailwind utility classes. Signed-off-by: MarioCadenas <[email protected]> Updates `docs/docs/plugins/agents.md` to document the new two-key auto-inherit model introduced in PR #302 (per-tool `autoInheritable` flag) and PR #304 (safe-by-default `autoInheritTools: { file: false, code: false }`). Adds an "Auto-inherit posture" subsection explaining that the developer must opt into `autoInheritTools` AND the plugin author must mark each tool `autoInheritable: true` for a tool to spread without explicit wiring. Includes a table documenting the `autoInheritable` marking on each core plugin tool, plus an example of the setup-time audit log so operators can see exactly what's inherited vs. skipped. Signed-off-by: MarioCadenas <[email protected]> - **Reference app no longer ships hardcoded dogfood URLs.** The three `https://e2-dogfood.staging.cloud.databricks.com/...` and `https://mario-mcp-hello-*.staging.aws.databricksapps.com/...` MCP URLs in `apps/agent-app/server.ts` are replaced with optional env-driven `VECTOR_SEARCH_MCP_URL` / `CUSTOM_MCP_URL` config. When set, their hostnames are auto-added to `agents({ mcp: { trustedHosts } })`. `.env.example` uses placeholder values the reader can replace instead of another team's workspace. - **`appkit.agent` → `appkit.agents` in the reference app.** The prior `appkit.agent as { list, getDefault }` cast papered over the plugin-name mismatch fixed in PR #304. The runtime key now matches the docs, the manifest, and the factory name; the cast is gone. - **Auto-inherit opt-in added to the reference config.** Since the defaults flipped to `{ file: false, code: false }` (PR #304, S-3), the reference now explicitly enables `autoInheritTools: { file: true }` so the markdown agents that ship alongside the code-defined one still pick up the analytics / files read-only tools. This is the pattern a real deployment should follow — opt in deliberately. Signed-off-by: MarioCadenas <[email protected]> - `apps/dev-playground/config/agents/autocomplete.md` sets `ephemeral: true`. Each debounced autocomplete keystroke no longer leaves an orphan thread in `InMemoryThreadStore` — the server now deletes the thread in the stream's `finally` (PR #304). Closes R1 from the MVP re-review. - `docs/docs/plugins/agents.md` documents the new `ephemeral` frontmatter key alongside the other AgentDefinition knobs. Signed-off-by: MarioCadenas <[email protected]> Documents the MVP resource caps landed in PR #304: the static request-body caps (enforced by the Zod schemas) and the three configurable runtime limits (`maxConcurrentStreamsPerUser`, `maxToolCalls`, `maxSubAgentDepth`). Includes the config-block shape in the main reference and a new "Resource limits" subsection under the Configuration section explaining the intent and per-user semantics of each cap. Signed-off-by: MarioCadenas <[email protected]>

…agents The main product layer. Turns an AppKit app into an AI-agent host with markdown-driven agent discovery, code-defined agents, sub-agents, and a standalone run-without-HTTP executor. `packages/appkit/src/core/create-agent-def.ts`. Returns the passed-in definition after cycle-detecting the sub-agent graph. No adapter construction, no side effects — safe at module top-level. The returned `AgentDefinition` is plain data, consumable by either `agents({ agents })` or `runAgent(def, input)`. `packages/appkit/src/plugins/agents/agents.ts`. `AgentsPlugin` class: - Loads markdown agents from `config/agents/*.md` (configurable dir) via real YAML frontmatter parsing (`js-yaml`). Frontmatter schema: `endpoint`, `model`, `toolkits`, `tools`, `default`, `maxSteps`, `maxTokens`, `baseSystemPrompt`. Unknown keys logged, invalid YAML throws at boot. - Merges code-defined agents passed via `agents({ agents: { name: def } })`. Code wins on key collision. - For each agent, builds a per-agent tool index from: 1. Sub-agents (`agents: {...}`) — synthesized as `agent-<key>` tools on the parent. 2. Explicit tool record entries — `ToolkitEntry`s, inline `FunctionTool`s, or `HostedTool`s. 3. Auto-inherit (if nothing explicit) — pulls every registered `ToolProvider` plugin's tools. Asymmetric default: markdown agents inherit (`file: true`), code-defined agents don't (`code: false`). - Mounts `POST /invocations` (OpenAI Responses compatible) + `POST /chat`, `POST /cancel`, `GET /threads/:id`, `DELETE /threads/:id`, `GET /info`. - SSE streaming via `executeStream`. Tool calls dispatch through `PluginContext.executeTool(req, pluginName, localName, args, signal)` for OBO, telemetry, and timeout. - Exposes `appkit.agent.{register, list, get, reload, getDefault, getThreads}` runtime helpers. `packages/appkit/src/core/run-agent.ts`. Runs an `AgentDefinition` without `createApp` or HTTP. Drives the adapter's event stream to completion, executing inline tools + sub-agents along the way. Aggregates events into `{ text, events }`. Useful for tests, CLI scripts, and offline pipelines. Hosted/MCP tools and plugin toolkits require the agents plugin and throw clear errors with guidance. - `AgentEventTranslator` — stateful converter from internal `AgentEvent`s to OpenAI Responses API `ResponseStreamEvent`s with sequence numbers and output indices. - `InMemoryThreadStore` — per-user conversation persistence. Nested `Map<userId, Map<threadId, Thread>>`. Implements `ThreadStore` from shared types. - `buildBaseSystemPrompt` + `composeSystemPrompt` — formats the AppKit base prompt (with plugin names and tool names) and layers the agent's instructions on top. `load-agents.ts` — reads `*.md` files, parses YAML frontmatter with `js-yaml`, resolves `toolkits: [...]` entries against the plugin provider index at load time, wraps ambient tools (from `agents({ tools: {...} })`) for `tools: [...]` frontmatter references. - Adds `js-yaml` + `@types/js-yaml` deps. - Manifest mounts routes at `/api/agent/*` (singular — matches `appkit.agent.*` runtime handle). - Exports from the main barrel: `agents`, `createAgent`, `runAgent`, `AgentDefinition`, `AgentsPluginConfig`, `AgentTool`, `ToolkitEntry`, `ToolkitOptions`, `BaseSystemPromptOption`, `PromptContext`, `isToolkitEntry`, `loadAgentFromFile`, `loadAgentsFromDir`. - 60 new tests: agents plugin lifecycle, markdown loading, code-agent registration, auto-inherit asymmetry, sub-agent tool synthesis, cycle detection, event translator, thread store, system prompt composition, standalone `runAgent`. - Full appkit vitest suite: 1297 tests passing. - Typecheck clean across all 8 workspace projects. Signed-off-by: MarioCadenas <[email protected]> `connectHostedTools` now builds an `McpHostPolicy` from the new `config.mcp` field (`trustedHosts`, `allowLocalhost`) and passes it to `AppKitMcpClient`. Same-origin workspace URLs are admitted with workspace auth; all other hosts must be explicitly trusted, and workspace credentials are never forwarded to them. - `AgentsPluginConfig.mcp?: McpHostPolicyConfig` added to the public config surface. See PR #302 for the policy definition and tests. - No behavioural change for the default case (same-origin Databricks workspace URLs): those continue to receive SP on setup and caller OBO on `tools/call`. - `knip.json`: ignore `packages/appkit/src/plugin/to-plugin.ts`. The `NamedPluginFactory` export introduced by this layer is consumed in a later stack layer; knip flags it as unused in the intermediate state. Signed-off-by: MarioCadenas <[email protected]> Adds a secure-by-default HITL approval gate for any tool annotated `destructive: true`. Before executing such a tool the agents plugin: 1. Emits a new `appkit.approval_pending` SSE event carrying the `approval_id`, `stream_id`, `tool_name`, `args`, and `annotations`. 2. Awaits a matching `POST /chat/approve` from the same user who initiated the stream. 3. Auto-denies after the configurable `timeoutMs` (default 60 s). A denial returns a short denial string to the adapter as the tool output so the LLM can apologise / replan instead of crashing. ```ts agents({ approval: { requireForDestructive?: boolean, // default: true timeoutMs?: number, // default: 60_000 }, }); ``` - `event-channel.ts`: single-producer / single-consumer async queue used to merge adapter events with out-of-band events emitted by `executeTool` (same SSE stream, single `executeStream` sink). - `tool-approval-gate.ts`: state machine keyed by `approvalId`. Owns the pending promise + timeout, enforces ownership on submit, exposes `abortStream(streamId)` + `abortAll()` for clean teardown. - `AgentEvent` gains an `approval_pending` variant. - `ResponseStreamEvent` gains `AppKitApprovalPendingEvent`. - `AgentEventTranslator.translate()` handles both. - `POST /approve` route with zod validation, ownership check, and 404 / 403 / 200 semantics. - `POST /cancel` now enforces the same ownership invariant (`resolveUserId(req) === stream.userId`) and aborts any pending approval gates on the stream. - `event-channel.test.ts` (7): ordering, buffered-before-iter, close semantics, close-with-error rejection, interleave. - `tool-approval-gate.test.ts` (8): approve / deny / timeout / ownership / abortStream / abortAll / late-submit no-op. - `approval-route.test.ts` (8): schema validation, unknown stream, ownership refusal, unknown approvalId, approve happy path, deny happy path, cancel clears pending gates, cancel ownership refusal. Full appkit suite: 1448 tests passing (+23 from Layer 3). Signed-off-by: MarioCadenas <[email protected]> `autoInheritTools` now defaults to `{ file: false, code: false }`. Markdown and code-defined agents with no explicit `tools:` declaration receive an empty tool index unless the developer explicitly opts in. When opted in (`autoInheritTools: { file: true }` or the boolean shorthand), `applyAutoInherit` now filters the spread strictly by each `ToolkitEntry.autoInheritable` flag (set on the source `ToolEntry` in PR #302). Any tool not marked `autoInheritable: true` is skipped and logged so the operator can see exactly what the safe default omits. Providers exposing tools only via `getAgentTools()` (no `toolkit()`) cannot be filtered per tool, so their entries are conservatively skipped during auto-inherit and must be wired explicitly via `tools:`. This removes the silent privilege-amplification path where registering a plugin implicitly granted its entire tool surface to every markdown agent. - New: safe default produces an empty tool index for both file and code agents even when an `autoInheritable: true` tool exists. - New: `autoInheritTools: { file: true }` spreads only the tool marked `autoInheritable: true`; an adjacent unmarked tool is skipped. - New: `autoInheritTools: true` (boolean shorthand) enables both origins and still filters by `autoInheritable`. - Updated: the prior "asymmetric default" test now validates the new safe-default semantics (empty index on both origins). Full appkit suite: 1451 tests passing (+3 from S-3 Layer 2). Signed-off-by: MarioCadenas <[email protected]> Five small correctness + DX fixes rounding out the MVP blocker list. - **Plugin name `agent` → `agents`.** The manifest name now matches the public runtime key (`appkit.agents.*`) and the factory export. Previously the cast in the reference app masked a real typing gap. - **`maxSteps` / `maxTokens` frontmatter / AgentDefinition fields now flow into the adapter.** Previously `resolveAdapter` built `DatabricksAdapter.fromModelServing(source)` without passing either value, so the documented knobs were silent no-ops. Now threaded through as `adapterOptions` when AppKit constructs the adapter itself (string model or omitted model); user-supplied `AgentAdapter` instances own their own settings as before. - **Single-`message` assistant turns are now persisted to the thread store.** The stream accumulator previously only handled `message_delta`; any adapter that yields a single final `message` (notably LangChain's `on_chain_end` path) silently dropped the assistant turn from thread history, so multi-turn LangChain conversations lost context. The loop now accumulates both kinds, with `message` replacing previously-accumulated deltas. - **Void-tool return no longer coerced into a fake error.** A tool handler that legitimately returns `undefined` (side-effecting fire-and-forget tools) was being reported to the LLM as `Error: Tool "x" execution failed`. Now returns `""` so the model sees a successful-but-empty result. - **Default `InMemoryThreadStore` is now loud about being dev-only.** Constructor logs an INFO in development and a WARN in production when no `threadStore` is supplied. Docstring rewritten to state unambiguously that the default is intended for local development / demos only, and points at a follow-up for a capped variant. Real caps + a persistent implementation are tracked as follow-ups. Signed-off-by: MarioCadenas <[email protected]> New `ephemeral?: boolean` field on `AgentDefinition` and the `ephemeral` markdown frontmatter key. When set, the thread created for a chat request against that agent is deleted from `ThreadStore` in the stream's `finally`. Intended for stateless one-shot agents (e.g. autocomplete) where each invocation is independent and retaining history would both poison future calls and accumulate unbounded state in the default `InMemoryThreadStore`. This closes the "autocomplete agent creates orphan thread per keystroke" regression flagged in the performance re-review (R1), which otherwise would have put an in-tree memory-leak demonstrator against the one perf finding S-2/S-3 consciously deferred. Cleanup errors in the finally block are logged at warn level so a late delete never masks the real response. `RegisteredAgent` mirrors the flag. `load-agents.ts` adds `ephemeral` to `ALLOWED_KEYS`. Signed-off-by: MarioCadenas <[email protected]> Rewrote `event-translator.ts` to allocate the message's `output_index` lazily on the first `message_delta` or `message` and to close any open message before emitting a subsequent `tool_call` / `tool_result` item. The previous implementation hardcoded `output_index: 0` for messages and incremented a counter starting at 1 for tool items, so any ReAct-style flow (tool call before text) produced `output_item. added` at index 1 followed by `output_item.added` at index 0 — monotonicity violation that OpenAI's own Responses-API SDK parsers enforce. Also fixed the companion bug from the original review: `message` after preceding `message_delta`s no longer double-emits `output_item.added` (it just emits the `done`), and `handleToolResult` coalesces `undefined` to `""` at the translator layer so the wire shape is always a string for every adapter (not just the ones that funnel through `agents.ts` executeTool). Four new regression tests pin the invariants: tool_call → text ordering, message-interrupted-by-tool, no duplicate added on full- message, undefined-tool-result → empty-string output. The one remaining HIGH security item from the prior review is now closed. Minimal, static caps at the schema layer; configurable per-deployment caps at runtime. Schemas: - `chatRequestSchema.message`: `.max(64_000)` — ~16k tokens, well above any legitimate chat turn. - `invocationsRequestSchema.input`: string `.max(64_000)`, array `.max(100)` items, per-item `content` string `.max(64_000)` or array `.max(100)` items. Runtime limits (new `AgentsPluginConfig.limits`): - `maxConcurrentStreamsPerUser` (default 5): `_handleChat` counts the user's active streams before admitting and returns HTTP 429 + `Retry-After: 5` when at-limit. Per-user, not global. - `maxToolCalls` (default 50): per-run budget tracked in the `executeTool` closure across the top-level adapter and any sub-agent delegations. Exceeding aborts the stream. - `maxSubAgentDepth` (default 3): `runSubAgent` rejects before any adapter work when the recursion depth exceeds the limit. Protects against a prompt-injected agent that delegates to itself transitively. 15 new tests exercise body caps (6), per-user limit with and without override (3), defaults and overrides on `resolvedLimits` (2), sub-agent depth boundary + violation (2), plus the remaining schema checks (2). Full appkit vitest suite: 1475 tests passing (+19 from this pass). Signed-off-by: MarioCadenas <[email protected]>

…template Final layer of the agents feature stack. Everything needed to exercise, demonstrate, and learn the feature. `apps/agent-app/` — a standalone app purpose-built around the agents feature. Ships with: - `server.ts` — full example of code-defined agents via `fromPlugin`: ```ts const support = createAgent({ instructions: "…", tools: { ...fromPlugin(analytics), ...fromPlugin(files), get_weather, "mcp.vector-search": mcpServer("vector-search", "https://…"), }, }); await createApp({ plugins: [server({ port }), analytics(), files(), agents({ agents: { support } })], }); ``` - `config/agents/assistant.md` — markdown-driven agent alongside the code-defined one, showing the asymmetric auto-inherit default. - Vite + React 19 + TailwindCSS frontend with a chat UI. - Databricks deployment config (`databricks.yml`, `app.yaml`) and deploy scripts. `apps/dev-playground/client/src/routes/agent.route.tsx` — chat UI with inline autocomplete (hits the `autocomplete` markdown agent) and a full threaded conversation panel (hits the default agent). `apps/dev-playground/server/index.ts` — adds a code-defined `helper` agent using `fromPlugin(analytics)` alongside the markdown-driven `autocomplete` agent in `config/agents/`. Exercises the mixed-style setup (markdown + code) against the same plugin list. `apps/dev-playground/config/agents/*.md` — both agents defined with valid YAML frontmatter. `docs/docs/plugins/agents.md` — progressive five-level guide: 1. Drop a markdown file → it just works. 2. Scope tools via `toolkits:` / `tools:` frontmatter. 3. Code-defined agents with `fromPlugin()`. 4. Sub-agents. 5. Standalone `runAgent()` (no `createApp` or HTTP). Plus a configuration reference, runtime API reference, and frontmatter schema table. `docs/docs/api/appkit/` — regenerated typedoc for the new public surface (fromPlugin, runAgent, AgentDefinition, AgentsPluginConfig, ToolkitEntry, ToolkitOptions, all adapter types, and the agents plugin factory). `template/appkit.plugins.json` — adds the `agent` plugin entry so `npx @databricks/appkit init --features agent` scaffolds the plugin correctly. - Full appkit vitest suite: 1311 tests passing - Typecheck clean across all 8 workspace projects - `pnpm docs:build` clean (no broken links) - `pnpm --filter=@databricks/appkit build:package` clean, publint clean Signed-off-by: MarioCadenas <[email protected]> Documents the new `mcp` configuration block and the rules it enforces: same-origin-only by default, explicit `trustedHosts` for external MCP servers, plaintext `http://` refused outside localhost-in-dev, and DNS-level blocking of private / link-local IP ranges (covers cloud metadata services). See PR #302 for the policy implementation and PR #304 for the `AgentsPluginConfig.mcp` wiring. Signed-off-by: MarioCadenas <[email protected]> - `docs/docs/plugins/agents.md`: new "SQL agent tools" subsection covering `analytics.query` readOnly enforcement, `lakebase.query` opt-in via `exposeAsAgentTool`, and the approval flow. New "Human-in-the-loop approval for destructive tools" subsection documents the config, SSE event shape, and `POST /chat/approve` contract. - `apps/agent-app`: approval-card component rendered inline in the chat stream whenever an `appkit.approval_pending` event arrives. Destructive badge + Approve/Deny buttons POST to `/api/agent/approve` with the carried `streamId`/`approvalId`. - `apps/dev-playground/client`: matching approval-card on the agent route, using the existing appkit-ui `Button` component and Tailwind utility classes. Signed-off-by: MarioCadenas <[email protected]> Updates `docs/docs/plugins/agents.md` to document the new two-key auto-inherit model introduced in PR #302 (per-tool `autoInheritable` flag) and PR #304 (safe-by-default `autoInheritTools: { file: false, code: false }`). Adds an "Auto-inherit posture" subsection explaining that the developer must opt into `autoInheritTools` AND the plugin author must mark each tool `autoInheritable: true` for a tool to spread without explicit wiring. Includes a table documenting the `autoInheritable` marking on each core plugin tool, plus an example of the setup-time audit log so operators can see exactly what's inherited vs. skipped. Signed-off-by: MarioCadenas <[email protected]> - **Reference app no longer ships hardcoded dogfood URLs.** The three `https://e2-dogfood.staging.cloud.databricks.com/...` and `https://mario-mcp-hello-*.staging.aws.databricksapps.com/...` MCP URLs in `apps/agent-app/server.ts` are replaced with optional env-driven `VECTOR_SEARCH_MCP_URL` / `CUSTOM_MCP_URL` config. When set, their hostnames are auto-added to `agents({ mcp: { trustedHosts } })`. `.env.example` uses placeholder values the reader can replace instead of another team's workspace. - **`appkit.agent` → `appkit.agents` in the reference app.** The prior `appkit.agent as { list, getDefault }` cast papered over the plugin-name mismatch fixed in PR #304. The runtime key now matches the docs, the manifest, and the factory name; the cast is gone. - **Auto-inherit opt-in added to the reference config.** Since the defaults flipped to `{ file: false, code: false }` (PR #304, S-3), the reference now explicitly enables `autoInheritTools: { file: true }` so the markdown agents that ship alongside the code-defined one still pick up the analytics / files read-only tools. This is the pattern a real deployment should follow — opt in deliberately. Signed-off-by: MarioCadenas <[email protected]> - `apps/dev-playground/config/agents/autocomplete.md` sets `ephemeral: true`. Each debounced autocomplete keystroke no longer leaves an orphan thread in `InMemoryThreadStore` — the server now deletes the thread in the stream's `finally` (PR #304). Closes R1 from the MVP re-review. - `docs/docs/plugins/agents.md` documents the new `ephemeral` frontmatter key alongside the other AgentDefinition knobs. Signed-off-by: MarioCadenas <[email protected]> Documents the MVP resource caps landed in PR #304: the static request-body caps (enforced by the Zod schemas) and the three configurable runtime limits (`maxConcurrentStreamsPerUser`, `maxToolCalls`, `maxSubAgentDepth`). Includes the config-block shape in the main reference and a new "Resource limits" subsection under the Configuration section explaining the intent and per-user semantics of each cap. Signed-off-by: MarioCadenas <[email protected]>

MarioCadenas force-pushed the agent/v2/3-plugin-infra branch from a5642df to e26795b Compare April 21, 2026 20:41

MarioCadenas force-pushed the agent/v2/4-agents-plugin branch from 3c7c35e to cb7fe2b Compare April 21, 2026 20:41

MarioCadenas force-pushed the agent/v2/3-plugin-infra branch from e26795b to d73e138 Compare April 22, 2026 08:45

MarioCadenas force-pushed the agent/v2/4-agents-plugin branch from cb7fe2b to 0afea5e Compare April 22, 2026 08:45

MarioCadenas mentioned this pull request Apr 22, 2026

feat(appkit): zero-trust MCP host policy with URL allowlist and scoped auth #307

Closed

7 tasks

MarioCadenas force-pushed the agent/v2/4-agents-plugin branch from 0afea5e to 983461c Compare April 22, 2026 09:24

MarioCadenas force-pushed the agent/v2/4-agents-plugin branch from 983461c to a7b0444 Compare April 22, 2026 09:46

MarioCadenas force-pushed the agent/v2/4-agents-plugin branch from a7b0444 to 623792d Compare April 22, 2026 09:59

MarioCadenas force-pushed the agent/v2/3-plugin-infra branch from 26f43e5 to 2ffa31d Compare April 22, 2026 09:59

MarioCadenas force-pushed the agent/v2/3-plugin-infra branch from 2ffa31d to 3a4deb7 Compare April 22, 2026 10:21

MarioCadenas force-pushed the agent/v2/4-agents-plugin branch from 623792d to 2f752a0 Compare April 22, 2026 10:21

MarioCadenas force-pushed the agent/v2/3-plugin-infra branch from 3a4deb7 to ca9cfca Compare April 22, 2026 10:49

MarioCadenas force-pushed the agent/v2/4-agents-plugin branch from 2f752a0 to dd583ad Compare April 22, 2026 10:50

MarioCadenas force-pushed the agent/v2/4-agents-plugin branch from dd583ad to 11981f7 Compare April 22, 2026 12:20

MarioCadenas force-pushed the agent/v2/3-plugin-infra branch from ca9cfca to c2a74ac Compare April 22, 2026 15:49

MarioCadenas force-pushed the agent/v2/4-agents-plugin branch from 11981f7 to acb9094 Compare April 22, 2026 15:49

MarioCadenas force-pushed the agent/v2/3-plugin-infra branch from c2a74ac to 14ca97c Compare April 22, 2026 16:19

MarioCadenas force-pushed the agent/v2/4-agents-plugin branch from acb9094 to 807142c Compare April 22, 2026 16:19

MarioCadenas force-pushed the agent/v2/4-agents-plugin branch from 807142c to ff80c98 Compare April 23, 2026 09:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(appkit): agents() plugin, createAgent(def), and markdown-driven agents#304

feat(appkit): agents() plugin, createAgent(def), and markdown-driven agents#304
MarioCadenas wants to merge 1 commit intoagent/v2/3-plugin-infrafrom
agent/v2/4-agents-plugin

MarioCadenas commented Apr 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MarioCadenas commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

createAgent(def) — pure factory

agents() plugin

runAgent(def, input) — standalone executor

Event translation and thread storage

Frontmatter loader

Plumbing

Test plan

PR Stack

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

MarioCadenas commented Apr 21, 2026 •

edited

Loading

`createAgent(def)` — pure factory

`agents()` plugin

`runAgent(def, input)` — standalone executor