Skip to content

Update docs for ToolHive v0.12.3–v0.13.0#641

Open
rdimitrov wants to merge 3 commits intomainfrom
update-docs-for-v0.12.3-v0.13.0
Open

Update docs for ToolHive v0.12.3–v0.13.0#641
rdimitrov wants to merge 3 commits intomainfrom
update-docs-for-v0.12.3-v0.13.0

Conversation

@rdimitrov
Copy link
Copy Markdown
Member

Summary

Catch up documentation with user-facing features shipped in ToolHive v0.12.3 through v0.13.0. Changes were verified against source code at each release tag.

  • Multi-upstream auth: Update outdated "single upstream provider" statements in backend-auth concept page and auth-k8s guide to reflect VirtualMCPServer multi-upstream support
  • Horizontal scaling: Rewrite scaling guide with new replicas and backendReplicas CRD fields, session storage config, and stdio transport limitations
  • forEach steps: Add forEach iteration section to composite tools guide with fields table and example workflow
  • Tool annotation overrides: Add annotation overrides section (title, readOnlyHint, destructiveHint, etc.) to tool aggregation guide
  • Skills CLI install: Replace "planned for a future release" with working thv skill install documentation (registry and git sources)
  • Cross-references: Add scaling link from run-mcp-k8s, Redis cross-reference to scaling guide

Releases covered

Release Date
v0.12.3 2026-03-19
v0.12.4 2026-03-19
v0.12.5 2026-03-24
v0.13.0 2026-03-26

Excluded (verified as not needing docs)

  • Webhook middleware (Phase 1-2 foundation, not user-configurable)
  • ResourceLink content type (transparent protocol fix)
  • Graceful shutdown (transparent infrastructure)
  • --publish flag (not present in CLI at v0.13.0)
  • Bug fixes (elicitation templates, runtime config merge)

Test plan

  • npm run build passes with no broken links or MDX errors
  • Verify YAML examples match CRD field names in auto-generated reference
  • Review rendering of new forEach, annotation overrides, and scaling sections
  • Confirm cross-references resolve correctly between guides

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings March 27, 2026 16:27
@vercel
Copy link
Copy Markdown

vercel bot commented Mar 27, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
docs-website Ready Ready Preview, Comment Mar 30, 2026 7:59pm

Request Review

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the ToolHive documentation to reflect user-facing features shipped in v0.12.3–v0.13.0, aligning guides and concept pages with current CRD/CLI capabilities (scaling, auth, composite workflows, and skills install).

Changes:

  • Document tool annotation overrides for aggregated tools (vMCP).
  • Rewrite scaling guidance to cover new replicas / backendReplicas fields and session storage requirements.
  • Add docs for forEach composite-tool steps and for installing skills via thv skill install (registry + git sources), plus cross-references between guides.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
docs/toolhive/guides-vmcp/tool-aggregation.mdx Adds annotation override documentation and example configuration.
docs/toolhive/guides-vmcp/scaling-and-performance.mdx Updates horizontal scaling guidance and adds session storage + MCPServer scaling sections.
docs/toolhive/guides-vmcp/composite-tools.mdx Documents forEach iteration steps and expands template context table.
docs/toolhive/guides-registry/skills.mdx Adds CLI-based skill installation instructions and flags table.
docs/toolhive/guides-k8s/run-mcp-k8s.mdx Adds cross-link to scaling guidance for MCPServer replicas.
docs/toolhive/guides-k8s/redis-session-storage.mdx Adds scaling-related context and links session storage to horizontal scaling.
docs/toolhive/guides-k8s/auth-k8s.mdx Updates upstream provider note to reflect multi-upstream support in VirtualMCPServer.
docs/toolhive/concepts/skills.mdx Updates current status to reflect CLI install support and links to guide.
docs/toolhive/concepts/backend-auth.mdx Replaces “single upstream” language with multi-upstream (vMCP) + clarifies MCPServer limitations.

Comment on lines +8 to +9
This guide explains how to scale MCPServer and Virtual MCP Server (vMCP)
deployments.
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The guide now claims it covers both MCPServer and Virtual MCP Server, but the vertical scaling section/examples still only describe VirtualMCPServer (podTemplateSpec with vmcp container). Either add the MCPServer equivalent (its podTemplateSpec uses the mcp container) or adjust the intro/description to avoid implying full MCPServer coverage for vertical scaling.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, adding MCPServer scaling to this guide seems out of place since this is specifically inside the vMCP section. If horizontal scaling for MCPServer is now supported, it seems like a scaling guide is called for in the main operator section, or a new section in the guides-k8s/run-mcp-k8s guide?

Comment on lines +390 to +397
| Field | Description | Default |
| --------------- | --------------------------------------------------- | ------- |
| `collection` | Template expression that produces an array | — |
| `itemVar` | Variable name for the current item | — |
| `maxParallel` | Maximum concurrent iterations (max 50) | 10 |
| `maxIterations` | Maximum total iterations (max 1000) | 100 |
| `step` | Inner step definition (tool call to execute per item) | — |
| `onError` | Error handling: `abort` (stop) or `continue` (skip) | abort |
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The forEach fields table is inconsistent with the generated CRD reference (docs/toolhive/reference/crd-spec.md): collection should be described as resolving to a JSON array or slice; itemVar defaults to "item" (not required); and maxParallel/maxIterations caps/defaults should match the reference (only maxIterations is documented as hard-capped at 1000). Updating this table will prevent users from configuring fields incorrectly.

Suggested change
| Field | Description | Default |
| --------------- | --------------------------------------------------- | ------- |
| `collection` | Template expression that produces an array ||
| `itemVar` | Variable name for the current item | |
| `maxParallel` | Maximum concurrent iterations (max 50) | 10 |
| `maxIterations` | Maximum total iterations (max 1000) | 100 |
| `step` | Inner step definition (tool call to execute per item) ||
| `onError` | Error handling: `abort` (stop) or `continue` (skip) | abort |
| Field | Description | Default |
| --------------- | ----------------------------------------------------------- | ------- |
| `collection` | Template expression that resolves to a JSON array or slice ||
| `itemVar` | Variable name for the current item | item |
| `maxParallel` | Maximum concurrent iterations | 10 |
| `maxIterations` | Maximum total iterations (hard cap 1000) | 100 |
| `step` | Inner step definition (tool call to execute per item) ||
| `onError` | Error handling: `abort` (stop) or `continue` (skip) | abort |

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

@danbarr danbarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted on Copilot's review comment, the new content about scaling MCPServer resources is out of place in the vMCP scaling/performance guide.

It should live with the other MCPServer docs in the main operator section - either as a new guide if the amount of content warrants it, or in the existing "Run MCP servers in Kubernetes" guide.

Comment on lines +8 to +9
This guide explains how to scale MCPServer and Virtual MCP Server (vMCP)
deployments.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, adding MCPServer scaling to this guide seems out of place since this is specifically inside the vMCP section. If horizontal scaling for MCPServer is now supported, it seems like a scaling guide is called for in the main operator section, or a new section in the guides-k8s/run-mcp-k8s guide?

@rdimitrov
Copy link
Copy Markdown
Member Author

@danbarr - I think I addressed the feedback so it should be ready for another round 👍

@danbarr
Copy link
Copy Markdown
Collaborator

danbarr commented Mar 28, 2026

@danbarr - I think I addressed the feedback so it should be ready for another round 👍

The updated locations of the scaling info make sense to me now. I'll clear my "changes requested" review and leave it to the platform team to do their technical review.

@danbarr danbarr dismissed their stale review March 28, 2026 03:33

Structural feedback addressed

there is no per-user attribution in audit logs.

When using static credentials, consider integrating with a secret management
system like [HashiCorp Vault](../integrations/vault.mdx) for secure storage and
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we mention 1password support around here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed 👍

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things:

  1. In the context of this guide, which is mostly going to be used in a Kubernetes deployment, the Vault callout makes sense since we've validated that integration; but we don't have any similar integration guide for 1Password (I'm also not sure how popular it is in a K8s setting anyway?)
  2. The CLI support for 1Password that the new link goes to is really only halfway implemented. It's read-only, was never carried forward into the UI, and honestly I'm surprised it's still there. I'd wager no one uses it.

Copy link
Copy Markdown
Member Author

@rdimitrov rdimitrov Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright, let me revert this commit so we can unblock this one for now

rdimitrov and others added 2 commits March 30, 2026 15:22
Catch up documentation with features shipped in v0.12.3 through v0.13.0.
Auto-generated CLI/CRD reference docs were already current; these changes
cover manual doc updates verified against source code at each release tag.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Radoslav Dimitrov <radoslav@stacklok.com>
reyortiz3
reyortiz3 previously approved these changes Mar 30, 2026
Copy link
Copy Markdown
Contributor

@yrobla yrobla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scalability review comment — session storage warning behavior

:::warning

If you configure multiple replicas without session storage, the operator sets a
`SessionStorageMissingForReplicas` status condition on the resource. Ensure
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Advisory-only: the operator does not block the scale-out

I checked the operator source (virtualmcpserver_controller.go:355 and the equivalent in mcpserver_controller.go:1962). Both validateSessionStorageForReplicas functions are explicitly annotated as // Advisory: — they only set the SessionStorageWarning status condition and reconciliation continues normally. The deployment is created/updated with the requested replica count regardless.

The current wording is accurate but incomplete — readers may assume the condition is a rejection. It should be explicit that the operator still applies the replica count:

Suggested change
`SessionStorageMissingForReplicas` status condition on the resource. Ensure
`SessionStorageMissingForReplicas` status condition on the resource but **still
applies the replica count**. Pods will start, but requests routed to a replica
that did not establish the session will fail. Ensure

One additional nit: the condition type shown in kubectl describe is SessionStorageWarning; SessionStorageMissingForReplicas is the reason. Users trying to watch or filter on this condition in scripts or alerts will need the type, not the reason — worth clarifying or using the type as the primary reference.

Copy link
Copy Markdown
Contributor

@yrobla yrobla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remaining scalability review comments

backendReplicas: 3
sessionStorage:
provider: redis
address: redis-master.toolhive-system.svc.cluster.local:6379
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded Redis address implies a specific namespace

The example uses redis-master.toolhive-system.svc.cluster.local as the address. Users deploying Redis in a different namespace (or using an external Redis) won't get any hint that this needs to change — they'll just have a silently misconfigured deployment. Worth adding a comment or callout that this is an example address and should match the actual Redis Service location.


When running multiple replicas, configure
[Redis session storage](./redis-session-storage.mdx) so that sessions are shared
across pods. If you omit `replicas` or `backendReplicas`, the operator defers
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing guidance on mixing replicas and backendReplicas independently

The doc states each can be scaled independently, but doesn't say what the meaningful combinations are. For example: is scaling only the proxy useful if there's one backend? What happens if backendReplicas > replicas? Operators will naturally ask this when deciding how to size their deployments. A brief note on the intended use case for each (proxy handles MCP protocol / auth; backend handles tool execution) would help readers make informed decisions.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After digging into the operator source, here's what each layer does and the meaningful combinations:

Architecture (from the code):

  • Proxy runner (spec.replicas) → a Deployment; handles MCP protocol, authentication, and session management. Stateless w.r.t. tool execution.
  • Backend (spec.backendReplicas) → a StatefulSet backed by a ClusterIP Service with ClientIP affinity (hardcoded 1800s timeout). A comment in client.go explicitly notes: "this provides proxy-runner-level stickiness (L4), not per-MCP-session stickiness (L7). True per-session routing would require Mcp-Session-Id-based routing at the proxy layer."
  • The SessionStorageWarning condition only checks spec.replicas, not spec.backendReplicas — so scaling only the backend beyond 1 replica produces no warning, even though backend ClientIP affinity has the same NAT limitations.

Suggested addition after the bullet list at line 448:

Suggested change
across pods. If you omit `replicas` or `backendReplicas`, the operator defers
- `spec.replicas` controls the proxy runner pod count
- `spec.backendReplicas` controls the backend MCP server pod count
The proxy runner handles authentication, MCP protocol framing, and session
management; it is stateless with respect to tool execution. The backend runs
the actual MCP server and executes tools.
Common configurations:
- **Scale only the proxy** (`replicas: N`, omit `backendReplicas`): useful when
auth and connection overhead is the bottleneck with a single backend.
- **Scale only the backend** (omit `replicas`, `backendReplicas: M`): useful when
tool execution is CPU/memory-bound and the proxy is not a bottleneck. The backend
StatefulSet uses client-IP session affinity (30-minute timeout) to route repeated
connections to the same pod — subject to the same NAT limitations as proxy-level
affinity.
- **Scale both** (`replicas: N`, `backendReplicas: M`): full horizontal scale.
Redis session storage is required when `replicas > 1`.
:::note
The `SessionStorageWarning` condition fires only when `spec.replicas > 1`. Scaling
only the backend (`backendReplicas > 1`) does not trigger a warning, but backend
client-IP affinity is still unreliable behind NAT or shared egress IPs.
:::

When running multiple replicas, configure
[Redis session storage](./redis-session-storage.mdx) so that sessions are shared
across pods. If you omit `replicas` or `backendReplicas`, the operator defers
replica management to an HPA or other external controller.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No mention of connection draining during scale-down

The doc covers scale-out but is silent on scale-in. For MCP servers with long-lived SSE or streaming connections, abrupt pod termination can drop active sessions. Is there a terminationGracePeriodSeconds recommendation, or does the operator inject any preStop hook? If not, it's worth noting so users can configure it themselves.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dug into the operator source. Here's the full picture:

What the operator does:

  • Both the proxy Deployment and vMCP Deployment hardcode terminationGracePeriodSeconds: 30 (mcpserver_controller.go:132, virtualmcpserver_deployment.go:61).
  • The proxy runner catches SIGTERM via signal.NotifyContext (main.go:39) and calls server.Shutdown(ctx) with a matching 30s shutdown timeout (transparent_proxy.go:142). If drain completes within 30s, connections close cleanly; if not, they are force-closed.
  • No preStop hook is injected by either controller.
  • The 30s grace period and 30s drain timeout are identical, leaving zero headroom. In practice, process startup overhead means SIGKILL can arrive before drain finishes for long-lived SSE streams.
  • terminationGracePeriodSeconds is not a CRD field — users can only change it via podTemplateSpec (which is supported but undocumented for this use case).
  • The backend StatefulSet does not have terminationGracePeriodSeconds set by the operator — it inherits the Kubernetes default of 30s.

Suggested addition at the end of the horizontal scaling section:

:::note[Connection draining on scale-down]

When a proxy runner pod is terminated (scale-in, rolling update, or node
eviction), Kubernetes sends SIGTERM and the proxy drains in-flight requests
for up to 30 seconds before force-closing connections. The grace period and
drain timeout are both 30 seconds with no headroom, so long-lived SSE or
streaming connections may be dropped if they exceed the drain window.

No preStop hook is injected by the operator. If your workload requires
additional time — for example, to let kube-proxy propagate endpoint removal
before the pod stops accepting traffic — override `terminationGracePeriodSeconds`
via `podTemplateSpec`:

```yaml
spec:
  podTemplateSpec:
    spec:
      terminationGracePeriodSeconds: 60

The same 30-second default applies to the backend StatefulSet and to
VirtualMCPServer pods.
:::


The `VirtualMCPServer` CRD includes a `sessionAffinity` field that controls how
the Kubernetes Service routes repeated client connections. By default, it uses
`ClientIP` affinity, which routes connections from the same client IP to the
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ClientIP affinity is unreliable behind NAT or load balancers

The doc presents ClientIP as the default sticky-session mechanism but doesn't mention its well-known limitation: when clients sit behind a NAT gateway, corporate proxy, or cloud load balancer, all traffic appears to originate from the same IP, routing everyone to the same pod and defeating horizontal scaling. For users deploying in typical cloud environments (EKS, GKE, AKS), this will silently underperform. Worth adding a caveat, e.g.:

Note: ClientIP affinity may be ineffective when clients share a NAT or egress IP. For more reliable session routing, consider stateless backends or a dedicated vMCP instance per team.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CRD confirms sessionAffinity accepts ClientIP or None, and the field comment (virtualmcpserver_types.go:41) already notes: "Set to None for stateless servers or when using an external load balancer with its own affinity." — worth surfacing in the doc.

Suggested replacement for the sessionAffinity paragraph:

Suggested change
`ClientIP` affinity, which routes connections from the same client IP to the
The `VirtualMCPServer` CRD includes a `sessionAffinity` field that controls how
the Kubernetes Service routes repeated client connections. By default, it uses
`ClientIP` affinity, which routes connections from the same client IP to the
same pod:
```yaml
spec:
sessionAffinity: ClientIP # default

:::warning[ClientIP affinity is unreliable behind NAT or shared egress IPs]

ClientIP affinity relies on the source IP reaching kube-proxy. When clients
sit behind a NAT gateway, corporate proxy, or cloud load balancer (common in
EKS, GKE, and AKS), all traffic appears to originate from the same IP —
routing every client to the same pod and eliminating the benefit of horizontal
scaling. This fails silently: the deployment appears healthy but only one pod
handles all load.

For stateless backends, disable affinity so the Service load-balances freely:

spec:
  sessionAffinity: None

For stateful backends where true per-session routing is required, ClientIP
affinity is a best-effort mechanism only. Prefer vertical scaling or a
dedicated vMCP instance per team instead.

:::

| --------------- | ----------------------------------------------------- | ------- |
| `collection` | Template expression that produces an array | — |
| `itemVar` | Variable name for the current item | item |
| `maxParallel` | Maximum concurrent iterations (max 50) | 10 |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maxParallel fan-out is per-pod, not distributed across replicas

The maxParallel: 50 cap may give the impression that concurrency is spread across the vMCP deployment. In practice, all iterations of a forEach step are dispatched by the single pod handling that composite tool request — so 50 parallel backend calls will originate from one pod regardless of spec.replicas. This is worth noting so users size their backend MCP servers and pod resources against the per-pod fan-out, not the aggregate deployment capacity.

Copy link
Copy Markdown
Contributor

@yrobla yrobla Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed by workflow_engine.go:701-741: all iterations run in an errgroup on the pod that received the composite tool request — no distribution across replicas.

The note fits right after the template access paragraph (line 402), before ### Error handling:

Suggested change
| `maxParallel` | Maximum concurrent iterations (max 50) | 10 |
`maxParallel` controls how many iterations run concurrently **on the pod that
received the composite tool request**. Iterations are not distributed across
vMCP replicas — all parallel backend calls originate from a single pod
regardless of `spec.replicas`.
When sizing your deployment, account for the per-pod fan-out: a `maxParallel: 50`
forEach step can open up to 50 simultaneous connections to backend MCP servers
from one pod. Ensure both the vMCP pod resources and the backend MCP servers
can handle that per-pod concurrency.
:::

| `collection` | Template expression that produces an array | — |
| `itemVar` | Variable name for the current item | item |
| `maxParallel` | Maximum concurrent iterations (max 50) | 10 |
| `maxIterations` | Maximum total iterations (max 1000) | 100 |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No guidance on timeout interaction with maxIterations and maxParallel

At maxIterations: 1000 and maxParallel: 10 (default), a forEach loop could run for 100 serial batches — if each backend call takes a few seconds, the total can easily exceed a workflow-level timeout. The doc covers per-step timeouts elsewhere but doesn't connect the two here. A brief callout would help users avoid silent truncation: e.g., workflow timeout should be at least ceil(maxIterations / maxParallel) × expected step duration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants