Skip to content

RFC-0060: Starlark Session Initialization for vMCP#60

Draft
jerm-dro wants to merge 18 commits intomainfrom
jerm/2026-03-24-starlark-programmable-middleware
Draft

RFC-0060: Starlark Session Initialization for vMCP#60
jerm-dro wants to merge 18 commits intomainfrom
jerm/2026-03-24-starlark-programmable-middleware

Conversation

@jerm-dro
Copy link
Copy Markdown
Contributor

@jerm-dro jerm-dro commented Mar 25, 2026

Introduce a Starlark-based session initialization script for vMCP. A single script runs once per session, receives discovered backends and their capabilities, and calls publish() to declare what the agent sees — optionally wrapping handlers with additional logic. Existing config knobs remain fully supported, but customization of vMCP behavior can now be exactly tailored to the use case without adding more knobs. Increasing configurability no longer means decreasing maintainability.

Proposes extending vMCP's Starlark engine from composite-tool-only
scripting into a unified programmable middleware surface, replacing
the growing set of independent config knobs (optimizer, filter,
rate limiting, PII scrubbing) with a single script per session.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Signed-off-by: Jeremy Drouillard <[email protected]>
@jerm-dro jerm-dro force-pushed the jerm/2026-03-24-starlark-programmable-middleware branch from b57f13b to 1fde39a Compare March 25, 2026 03:54
- Rename "middleware script" → "session initialization script" throughout
- backends() returns dict[string, Backend] instead of flat tools() list
- Handler functions take single dict arg instead of **kwargs
- metadata() requires all fields (name, description, parameters, annotations)
- Rewrite presets section: no config nesting, auto-generation from existing config
- Update sequence diagram: remove Starlark Engine, use MultiSession
- Restructure implementation phases: feature parity first, then new capabilities
- Simplify Cedar interaction section
- Remove THV-0058 reference
- Update open questions (remove resolved, add error handling)
- Rename CRD to VirtualMCPSessionInitScript
- Rename config struct to SessionInitConfig

Co-Authored-By: Claude Opus 4.6 <[email protected]>
jerm-dro and others added 5 commits March 25, 2026 17:55
- Rewrite authz model: scripts see all backends, authz filters at runtime
- Add Background section explaining how authorization works today
- Replace call_tool() with saved handler dict pattern in all examples
- Collapse three presets into single `default` preset with ~80-line sketch
- Add handler timeout support (optional kwarg)
- Move current_user() to future built-ins (user known at init, deferred)
- Fix scrub_pii() as future built-in example, not v0 deliverable
- Add config() built-in for preset access to persona config
- Expand Alternatives Considered: configuration approaches + language
  considerations (Starlark, Risor, Wasm, Lua, OPA/Rego)
- Add 4-phase implementation plan (POC → production → deprecate → ship)
- Update Why Now with cost equation argument
- Clarify scope: enabling new capabilities, not shipping them

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Reference stacklok/toolhive#4373 as a concrete example of feature
interaction pain in the Problem Statement. Add open question about
whether authz decisions should move into Starlark to unify the
"who sees what?" model.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@jerm-dro jerm-dro changed the title RFC-0059: Starlark as programmable middleware for vMCP RFC-0060: Starlark Session Initialization for vMCP Mar 26, 2026
jerm-dro and others added 9 commits March 25, 2026 19:14
The optimizer/Cedar issues (#4373, #4374) are about the authz boundary,
not config knob combinations. Move them to Open Question 2 where they
motivate pulling authz into Starlark. Use #4287 as the Problem Statement
example instead.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Add "Prior Art: Gateway Configurability Patterns" covering Envoy, Kong,
and the Configuration Complexity Clock. Add vMCP feature dependency
diagram to the Problem Statement.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
…tion

Split monolithic "Config knob combinations" into two focused sections:
"The configurability problem" (user-facing interaction examples) and
"The maintainability problem" (developer-facing quadratic cost). Move
Background (prior art + authz) under Proposed Solution where it provides
context for the design.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Reframe problem as configurability vs maintainability tension. Move bug
evidence and dependency diagram to maintainability section. Remove
feature table (duplicative with diagram). Update summary to mirror
problem framing. Add links to Envoy, Kong, and Configuration Complexity
Clock resources.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Elaborate rate limiting interaction questions (composite tools, groups).
Add optimizer × authz bugs (#4373, #4374) to maintainability section.
Replace PII hypothetical with concrete framing. Add Envoy and Kong
source code snippets with GitHub links to prior art section.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Replace hand-written code snippets with examples from official
documentation. Link to doc pages instead of source files.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
- Move Motivating Use Cases to Appendix A
- Update Last Updated date to 2026-03-27
- Replace scrub_pii decorator example with v0 logging decorator
- Restore missing filter decorator in architecture diagram
- Remove duplicate composite tools mentions from presets/config sections
- Note composite tools support is optional in compatibility section
- Fix Open Question 3 bold formatting

Co-Authored-By: Claude Opus 4.6 <[email protected]>
The CI validator checks all new files under rfcs/ for the THV-####
naming convention. Move the image to images/ at the repo root.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@jerm-dro jerm-dro requested a review from JAORMX March 28, 2026 02:24
Add Alternative 2 covering Go code refactoring as a competing approach.
Acknowledge its value while noting it doesn't address configurability
or cross-cutting concerns. Include AI-driven development observation.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@jerm-dro jerm-dro requested a review from reyortiz3 March 28, 2026 02:31
Copy link
Copy Markdown
Contributor

@yrobla yrobla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strong RFC overall — the problem framing, prior art, and alternatives section are all excellent. The backends() + publish() model is clean, and the default preset sketch makes backward compatibility concrete rather than aspirational.

Filing 3 blockers, 3 should-address items, 2 questions, and 2 nits.

)
elif strategy == "priority":
existing_backend = seen_names[meta.name]
if priority_order.index(backend_name) > priority_order.index(existing_backend):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Blocker] priority_order.index(backend_name) will raise if backend_name isn't in the list — which is likely for any backend the admin didn't explicitly rank. Starlark has no try/except so this needs explicit handling:

Suggested change
if priority_order.index(backend_name) > priority_order.index(existing_backend):
if backend_name not in priority_order or existing_backend not in priority_order:
continue # skip priority resolution if backend not ranked
if priority_order.index(backend_name) > priority_order.index(existing_backend):

Also worth building a reverse-lookup dict upfront to avoid O(n) .index() on every collision.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — Starlark has no try/except, so .index() on an unranked backend would halt the script with no recovery path. I've added an explicit in check and skip priority resolution for backends not in the priority list. For the reverse-lookup dict optimization: good idea for the production implementation, but I'll keep the sketch simple for readability since the POC will validate and optimize the exact implementation.

metadata(
name="find_tool",
description="Search for tools. Available: " + summary,
parameters=FIND_TOOL_SCHEMA,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Blocker] FIND_TOOL_SCHEMA and CALL_TOOL_SCHEMA are used throughout the RFC and appendix (lines 233, 393, 398, 759, 771) but never defined anywhere. Are these Go-defined constants injected into the Starlark global scope? Must users define them in custom scripts? This is a gap in the built-in spec that every optimizer script author will hit immediately.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question — these are JSON Schema dicts defining the parameters for the optimizer's find_tool and call_tool synthetic tools. Rather than injecting them as Go-defined globals, I've inlined them at the top of the default preset sketch so they're visible and editable. Custom scripts that use the optimizer pattern can define their own schemas.

| Built-in | Signature | Description |
|----------|-----------|-------------|
| `search_index(tools)` | `search_index(list[(metadata, handler)]) → SearchIndex` | Builds a semantic search index over the tool list. Returns an object with `.search(query) → list[dict]`. |
| `elicit(message, schema)` | `elicit(message, schema={}) → struct(action, content)` | Prompts the user for a decision via MCP elicitation. |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Blocker] elicit() calls MCP elicitation, which is optional — many clients don't implement it. The RFC doesn't specify the fallback. For Use Case 2 (elicitation gate for write ops), the choice matters a lot:

  1. Return struct(action="accept") — permissive, but silently bypasses the entire security gate for non-elicitation clients
  2. Return struct(action="reject") — fail-closed, potentially very disruptive
  3. Raise an error — forces script authors to handle it explicitly

This should be a deliberate, documented choice rather than an implementation detail.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for raising this — elicit() is optional in MCP and many clients don't implement it. I've added a when_unavailable parameter to the elicit() built-in that makes the fallback behavior explicit. Accepted values are "accept", "reject", and "error". There is no default — the parameter is required, so script authors must make a deliberate choice.


#### Existing config fields

`aggregation`, `compositeToolRefs`, and `optimizer` remain on `Config`. The `default` preset reads these fields and produces identical behavior. When a custom `sessionInit.script` or `sessionInit.scriptFile` is set, these fields are ignored (if both are set, vMCP logs a warning).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Should address] "If both are set, vMCP logs a warning" is too weak for a mutual-exclusion constraint. Silently ignoring one of preset/script/scriptFile when two are set will cause confusing behavior where the user edits the wrong source. This should be a hard config load error, consistent with describing the fields as "mutually exclusive."

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — this should be a hard validation error at config load time, not a warning. Updated: if more than one of preset, script, or scriptFile is set, vMCP rejects the configuration.

- **Migration guide**: From old config knobs to session init presets or custom scripts
- **Architecture docs**: Updated vMCP architecture with session initialization model

## Open Questions
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Should address] Open Question 1 has a natural resolution via the MCP spec: handler failures should surface as {"isError": true, "content": [{"type": "text", "text": "..."}]} dicts — the same shape backend servers use today. The engine catches Go errors from backend calls and converts them to this shape; scripts use if result.get("isError") for decorator-style recovery. Worth resolving before implementation since error semantics affect how every handler is written.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this proposal — the explicit behavior matches MCP semantics cleanly. Backend call failures are caught by the Go engine and returned as {"isError": true, "content": [...]} dicts rather than halting the script. Scripts can inspect result.get("isError") for decorator-style error recovery. This keeps the other patterns (fallback decorators, retry wrappers) available as userland Starlark — no special built-ins needed. Resolved the open question and added an error handling section to the programming model with a with_fallback example.

- Session factory runs the script and constructs `MultiSession` from `publish()` results
- Implement the `default` preset that reads existing config knobs (`aggregation`, `optimizer`, etc.)
- **Delete** the existing decorator-based code that supports these config knobs today (optimizer, filter, composite tools decorators)
- All existing tests must pass **except** those that test legacy composite tools (`compositeTools`, `compositeToolRefs`)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Should address] Phase 1 says to delete the existing decorator-based code as part of the POC. That ties a production deletion to a proof-of-concept — if the POC surfaces a design flaw, there's no fallback. Suggest keeping the decorators through Phase 1 and deleting them only after Phase 2's preset equivalence tests pass. The POC goal should be validating the design, not gating production functionality on the outcome.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point — deleting production code during a POC is risky. Moved the deletion to later phases where it's gated behind preset equivalence tests. Phase 1 now keeps existing decorators intact and runs the Starlark engine alongside them for comparison testing.

Authz-->>Agent: CallToolResult
```

The script runs **once** per session, not per request. `publish()` calls build up the tool set. The resulting `(metadata, handler)` pairs are used to construct the `MultiSession`, which handles all subsequent `tools/list` and `tools/call` requests. The authz middleware sits between the agent and the session, filtering and gating requests at runtime.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Question] The RFC says "runs once per session" but doesn't define session scope. Is it per-connection? Per-user? If the MultiSession is reused across multiple users of the same persona, current_user() (line 283) can't work as described in Open Question 2 and Use Cases 5/6 — it would be stale from session construction time. This matters for understanding the design ceiling, especially given that Open Question 2 explores pulling authz into Starlark.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. The session initialization script runs once per session — either when a session is created or when it's restored from Redis. The Starlark state is in-memory and not serialized, but can be recreated by re-running the script. Even though the script output could technically be shared across sessions today, running it once per session is a deliberate choice — it enables safe adoption of user-centric built-ins like current_user() in the future without requiring an architectural change. Added this clarification to the RFC.


This means publishing a tool via `publish()` does not bypass authorization. The script controls *what tools exist and how they behave*; the authz middleware controls *who can see and use them*.

**Note**: Because the script sees all discovered tools, a handler could dispatch to tools the current user isn't authorized to use by saving handler references into a dict. This is the same trust model as composite tools today — administrators who write composite tool configurations can already wire calls to any backend tool. Fixing this boundary (moving authz before session construction) is out of scope for this RFC.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Question] The RFC says "publishing a tool does not bypass authorization" which is technically true, but the call_tool handler pattern recommended throughout (default preset, Use Case 1, Use Case 6) dispatches via saved Go function references that go directly to the backend, completely bypassing Cedar authz middleware. This is the root cause of #4374. Should there be an explicit warning at the point where the pattern is introduced, not just in this section? Every optimizer user will write this pattern and hit the same bug.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question — this motivates a change to the scope and migration plan. The call_tool handler pattern dispatches directly to backends, bypassing Cedar authz middleware, which is the same root cause as #4374. Alejandro's PR #4385 fixes the immediate bug but intentionally takes on tech debt. Rather than recreating this foot-gun in Starlark, I think we should update the rollout plan to ship optimizer + authz together so the relationship between them is explicit. Safe capabilities (name resolution, filtering, rate limiting) ship first, then optimizer + authz integration ships together in a later phase.

I've restructured the implementation phases accordingly and updated the default preset sketch to show how authz integration would look (e.g. enforce_cedar_policies(all_published)). I've also adjusted the scope of this RFC — it now establishes the phases at a high level and how they fit together. The exact nature of the authz built-ins is open to discussion and doesn't need to be resolved here for more distant phases.

**Answer: presets.** A preset is a named, built-in Starlark script that replicates the behavior of today's config knobs. Presets are transparent — users can inspect the underlying Starlark source and fork it when they need customization:

```bash
thv vmcp show-preset default
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Nit] thv vmcp show-preset default is great for transparency, but there's no mention of a thv vmcp list-presets command. Users need to discover what presets exist before they know to inspect them.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point — added thv vmcp list-presets alongside show-preset.

- Fix priority_order.index() crash for unranked backends in default preset
- Inline FIND_TOOL_SCHEMA and CALL_TOOL_SCHEMA in default preset sketch
- Add when_unavailable parameter to elicit() for non-elicitation clients
- Make preset/script/scriptFile mutual exclusion a hard validation error
- Resolve error handling open question: MCP-standard isError response dicts
- Keep existing decorators in Phase 1 POC, delete in later phases
- Clarify session scope: runs once per session creation or Redis restore
- Restructure rollout: safe capabilities first, optimizer + authz together
- Add warning about handler dispatch bypassing Cedar authz (#4374)
- Reference PR #4385 as interim fix for optimizer + authz bypass
- Resolve authz open question: ship with optimizer, defer exact design
- Add thv vmcp list-presets command

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Copy link
Copy Markdown
Contributor

@JAORMX JAORMX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the proposal and think that overall this is a good step to provide more customizability and programmability to vMCP. I'd be wary of fully relying on this mechanism for vMCP's programmability, API Gateways tend to offer programming languages as pluggability options (as you mentioned, Envoy's usage of Lua) but they often don't recommend them for critical paths and instead opt for other type of programmability or extensions. e.g. ext_proc via a sidecar which is quite standard to see in the wild or even c++ extensions since they support that too. For us it's a little tricky to know without previous data. So I'd say let's start getting it by adding this addition, but still opt for built-in middleware for extensibility since that's the paved path we have today. e.g. rate limiting should still be implemented via dedicated middleware IMO.

Copy link
Copy Markdown
Contributor

@yrobla yrobla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good RFC overall — the core argument (config knob interactions grow quadratically, Starlark + built-ins inverts this) is solid and the prior art section grounds it well. Comments below on a few specific areas.

- Production-quality `backends()`, `publish()`, `metadata()` built-ins
- Name resolution, filtering, and overrides via the `default` preset
- Rate limiting integration
- `thv vmcp show-preset` command to inspect built-in presets
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restructuring to ship optimizer + authz together makes sense — the bypass isn't a "we'll fix it later" trade-off, it's a correctness bug. find_tool's dispatch table hands out access to tools Cedar would block (#4374). The interim fix in #4385 plugs one hole but the model is still broken, so shipping the optimizer alone would be knowingly shipping a privilege escalation.

On rate limiting: have you considered just implementing THV-0057 as a check_rate_limit() built-in rather than standalone middleware? Mechanism is identical (Redis token bucket), the default preset reads the same rateLimiting config block so non-script users see zero difference, and you avoid writing middleware that you're going to replace anyway. The main gap is that without current_user() you can only key on tool name — no per-user limits yet. That's fine as long as the Phase 2 scope note is explicit about it, otherwise it'll land as a disappointment.


| Built-in | Signature | Description |
|----------|-----------|-------------|
| `current_user()` | `current_user() → struct(sub, email, groups)` | Returns the authenticated user's identity. The user is known at init time, but this built-in is deferred to a future version. |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth nailing down the semantics before this ships: is current_user() meant to be called at script top-level only (gives you the session-creation user), or also inside handler closures (gives you the request-time user)? Use Case 5 calls it inside a wrapper function that runs per-request — that only works if it reflects the current request's identity. Those are two pretty different contracts and threading request context into handler calls has real implementation implications. If you leave this ambiguous now you'll probably end up with a breaking change to handler semantics when current_user() actually lands.

- `tools/list` responses are filtered to remove tools the caller isn't authorized to use
- `tools/call` requests are gated — unauthorized calls return 403

This means publishing a tool via `publish()` does not bypass authorization. The script controls *what tools exist and how they behave*; the authz middleware controls *who can see and use them*.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RFC is careful to separate "script controls what tools exist" from "Cedar controls who can use them" — but current_user() makes that line leaky. An admin can write if "admin" in current_user().groups: publish(admin_tool, fn) and now you've got authz logic in two places. That might be totally fine given the trust model (admins write scripts), but it's worth one sentence in the security section saying whether identity-based tool filtering in scripts is intentional or something you want to steer people away from in favor of Cedar.

|--------|----------|--------------------|
| `default` | Reads existing config knobs (`aggregation`, `optimizer`, etc.) and produces identical behavior to the current config-driven system. Applies filtering, renaming, conflict resolution, and optimizer behavior based on what's configured. | All existing config |

A single `default` preset handles all existing config knobs. When no `sessionInit` block is present, vMCP uses the `default` preset, which reads the existing config fields and produces identical behavior. There is no separate legacy code path — the Starlark engine is the single implementation.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replacing the existing code paths with the default preset is clean, but it also means the preset is now a single point of failure for everyone who hasn't written a custom script. The equivalence tests cover correctness, but what about a bug that ships mid-upgrade? Is there a way to pin to a preset version, or is "fork it via show-preset" the intended escape hatch? Just worth thinking through — a subtle behavior change in the preset silently affects all existing deployments at once.


## Open Questions

1. **Sessionless MCP requests**: What happens when MCP supports requests without sessions? Do we have to run this heavy script on every request? We could actually run the script once at startup, since it does not depend on request-time information. However, if we fold in authz concerns from above, then `current_user()` will be request-time information. We could cheat around this by recommending all logic which depends on `current_user()` be placed at the end of the script. When that's encountered during startup, we block and restore the state on each request. Alternatively, we could support two different scripts. One for initialization and one per-request.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "put current_user() logic at the end" heuristic is going to confuse people. If sessionless MCP is a real possibility, I'd just commit to a two-hook model now: on_session_init() for what tools exist and how they're shaped, on_request() for per-request concerns like rate limiting and user-specific filtering. It maps directly onto the distinction the RFC already makes, survives the sessionless transition cleanly, and removes a footgun where script ordering determines correctness in a non-obvious way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants