Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 16 additions & 11 deletions docs/toolhive/concepts/backend-auth.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -211,18 +211,23 @@ deployments using the ToolHive Operator.
- **Direct upstream redirect:** The embedded authorization server redirects
clients directly to the upstream provider for authentication (for example,
GitHub or Atlassian).
- **Single upstream provider:** Currently supports one upstream identity
provider per configuration.

:::info[Chained authentication not yet supported]

The embedded authorization server redirects clients directly to the upstream
provider. This means the upstream provider must be the service whose API the MCP
server calls. Chained authentication—where a client authenticates with a
- **Multiple upstream providers (VirtualMCPServer):** VirtualMCPServer supports
configuring multiple upstream identity providers with sequential
authentication. When multiple providers are configured, the authorization
server chains the authentication flow through each provider in sequence,
collecting tokens from all of them. This enables scenarios where backend tools
require tokens from different providers (such as a corporate IdP and GitHub).

:::info[Chained authentication for MCPServer]

MCPServer and MCPRemoteProxy support only one upstream provider. The embedded
authorization server redirects clients directly to that provider, so the
provider must be the service whose API the MCP server calls. If your MCPServer
deployment requires chained authentication—where a client authenticates with a
corporate IdP like Okta, which then federates to an external provider like
GitHub—is not yet supported. If your deployment requires this pattern, consider
using [token exchange](#same-idp-with-token-exchange) with a federated identity
provider instead.
GitHub—consider using [token exchange](#same-idp-with-token-exchange) with a
federated identity provider instead, or use a VirtualMCPServer with multiple
upstream providers.

:::

Expand Down
14 changes: 7 additions & 7 deletions docs/toolhive/guides-k8s/auth-k8s.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -464,13 +464,13 @@ kubectl apply -f embedded-auth-config.yaml

**Configuration reference:**

| Field | Description |
| ---------------------- | ---------------------------------------------------------------------------------------------------------------------- |
| `issuer` | HTTPS URL identifying this authorization server. Appears in the `iss` claim of issued JWTs. |
| `signingKeySecretRefs` | References to Secrets containing JWT signing keys. First key is active; additional keys support rotation. |
| `hmacSecretRefs` | References to Secrets with symmetric keys for signing authorization codes and refresh tokens. |
| `tokenLifespans` | Configurable durations for access tokens (default: 1h), refresh tokens (default: 168h), and auth codes (default: 10m). |
| `upstreamProviders` | Configuration for the upstream identity provider. Currently supports one provider. |
| Field | Description |
| ---------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `issuer` | HTTPS URL identifying this authorization server. Appears in the `iss` claim of issued JWTs. |
| `signingKeySecretRefs` | References to Secrets containing JWT signing keys. First key is active; additional keys support rotation. |
| `hmacSecretRefs` | References to Secrets with symmetric keys for signing authorization codes and refresh tokens. |
| `tokenLifespans` | Configurable durations for access tokens (default: 1h), refresh tokens (default: 168h), and auth codes (default: 10m). |
| `upstreamProviders` | Configuration for upstream identity providers. MCPServer and MCPRemoteProxy support one provider; VirtualMCPServer supports multiple providers for sequential authentication. |

**Step 5: Create the MCPServer resource**

Expand Down
7 changes: 6 additions & 1 deletion docs/toolhive/guides-k8s/redis-session-storage.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: Redis Sentinel session storage
description:
How to deploy Redis Sentinel and configure persistent session storage for the
ToolHive embedded authorization server.
ToolHive embedded authorization server and horizontal scaling.
---

Deploy Redis Sentinel and configure it as the session storage backend for the
Expand All @@ -12,6 +12,11 @@ re-authenticate. Redis Sentinel provides persistent storage with automatic
master discovery, ACL-based access control, and optional failover when replicas
are configured.

Redis session storage is also required for horizontal scaling when running
multiple [MCPServer](./run-mcp-k8s.mdx#horizontal-scaling) or
[VirtualMCPServer](../guides-vmcp/scaling-and-performance.mdx#session-storage-for-multi-replica-deployments)
replicas, so that sessions are shared across pods.

:::info[Prerequisites]

Before you begin, ensure you have:
Expand Down
37 changes: 37 additions & 0 deletions docs/toolhive/guides-k8s/run-mcp-k8s.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -439,6 +439,41 @@ For more details about a specific MCP server:
kubectl -n <NAMESPACE> describe mcpserver <NAME>
```

## Horizontal scaling

MCPServer creates two separate Deployments: one for the proxy runner and one for
the MCP server backend. You can scale each independently:

- `spec.replicas` controls the proxy runner pod count
- `spec.backendReplicas` controls the backend MCP server pod count

```yaml title="MCPServer resource"
spec:
replicas: 2
backendReplicas: 3
sessionStorage:
provider: redis
address: redis-master.toolhive-system.svc.cluster.local:6379
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded Redis address implies a specific namespace

The example uses redis-master.toolhive-system.svc.cluster.local as the address. Users deploying Redis in a different namespace (or using an external Redis) won't get any hint that this needs to change — they'll just have a silently misconfigured deployment. Worth adding a comment or callout that this is an example address and should match the actual Redis Service location.

db: 0
keyPrefix: mcp-sessions
passwordRef:
name: redis-secret
key: password
```

When running multiple replicas, configure
[Redis session storage](./redis-session-storage.mdx) so that sessions are shared
across pods. If you omit `replicas` or `backendReplicas`, the operator defers
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing guidance on mixing replicas and backendReplicas independently

The doc states each can be scaled independently, but doesn't say what the meaningful combinations are. For example: is scaling only the proxy useful if there's one backend? What happens if backendReplicas > replicas? Operators will naturally ask this when deciding how to size their deployments. A brief note on the intended use case for each (proxy handles MCP protocol / auth; backend handles tool execution) would help readers make informed decisions.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After digging into the operator source, here's what each layer does and the meaningful combinations:

Architecture (from the code):

  • Proxy runner (spec.replicas) → a Deployment; handles MCP protocol, authentication, and session management. Stateless w.r.t. tool execution.
  • Backend (spec.backendReplicas) → a StatefulSet backed by a ClusterIP Service with ClientIP affinity (hardcoded 1800s timeout). A comment in client.go explicitly notes: "this provides proxy-runner-level stickiness (L4), not per-MCP-session stickiness (L7). True per-session routing would require Mcp-Session-Id-based routing at the proxy layer."
  • The SessionStorageWarning condition only checks spec.replicas, not spec.backendReplicas — so scaling only the backend beyond 1 replica produces no warning, even though backend ClientIP affinity has the same NAT limitations.

Suggested addition after the bullet list at line 448:

Suggested change
across pods. If you omit `replicas` or `backendReplicas`, the operator defers
- `spec.replicas` controls the proxy runner pod count
- `spec.backendReplicas` controls the backend MCP server pod count
The proxy runner handles authentication, MCP protocol framing, and session
management; it is stateless with respect to tool execution. The backend runs
the actual MCP server and executes tools.
Common configurations:
- **Scale only the proxy** (`replicas: N`, omit `backendReplicas`): useful when
auth and connection overhead is the bottleneck with a single backend.
- **Scale only the backend** (omit `replicas`, `backendReplicas: M`): useful when
tool execution is CPU/memory-bound and the proxy is not a bottleneck. The backend
StatefulSet uses client-IP session affinity (30-minute timeout) to route repeated
connections to the same pod — subject to the same NAT limitations as proxy-level
affinity.
- **Scale both** (`replicas: N`, `backendReplicas: M`): full horizontal scale.
Redis session storage is required when `replicas > 1`.
:::note
The `SessionStorageWarning` condition fires only when `spec.replicas > 1`. Scaling
only the backend (`backendReplicas > 1`) does not trigger a warning, but backend
client-IP affinity is still unreliable behind NAT or shared egress IPs.
:::

replica management to an HPA or other external controller.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No mention of connection draining during scale-down

The doc covers scale-out but is silent on scale-in. For MCP servers with long-lived SSE or streaming connections, abrupt pod termination can drop active sessions. Is there a terminationGracePeriodSeconds recommendation, or does the operator inject any preStop hook? If not, it's worth noting so users can configure it themselves.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dug into the operator source. Here's the full picture:

What the operator does:

  • Both the proxy Deployment and vMCP Deployment hardcode terminationGracePeriodSeconds: 30 (mcpserver_controller.go:132, virtualmcpserver_deployment.go:61).
  • The proxy runner catches SIGTERM via signal.NotifyContext (main.go:39) and calls server.Shutdown(ctx) with a matching 30s shutdown timeout (transparent_proxy.go:142). If drain completes within 30s, connections close cleanly; if not, they are force-closed.
  • No preStop hook is injected by either controller.
  • The 30s grace period and 30s drain timeout are identical, leaving zero headroom. In practice, process startup overhead means SIGKILL can arrive before drain finishes for long-lived SSE streams.
  • terminationGracePeriodSeconds is not a CRD field — users can only change it via podTemplateSpec (which is supported but undocumented for this use case).
  • The backend StatefulSet does not have terminationGracePeriodSeconds set by the operator — it inherits the Kubernetes default of 30s.

Suggested addition at the end of the horizontal scaling section:

:::note[Connection draining on scale-down]

When a proxy runner pod is terminated (scale-in, rolling update, or node
eviction), Kubernetes sends SIGTERM and the proxy drains in-flight requests
for up to 30 seconds before force-closing connections. The grace period and
drain timeout are both 30 seconds with no headroom, so long-lived SSE or
streaming connections may be dropped if they exceed the drain window.

No preStop hook is injected by the operator. If your workload requires
additional time — for example, to let kube-proxy propagate endpoint removal
before the pod stops accepting traffic — override `terminationGracePeriodSeconds`
via `podTemplateSpec`:

```yaml
spec:
  podTemplateSpec:
    spec:
      terminationGracePeriodSeconds: 60

The same 30-second default applies to the backend StatefulSet and to
VirtualMCPServer pods.
:::


:::warning[Stdio transport limitation]

Backends using the `stdio` transport are limited to a single replica. The
operator rejects configurations with `backendReplicas` greater than 1 for stdio
backends.

:::

## Next steps

- [Connect clients to your MCP servers](./connect-clients.mdx) from outside the
Expand All @@ -455,6 +490,8 @@ kubectl -n <NAMESPACE> describe mcpserver <NAME>

- [Kubernetes CRD reference](../reference/crd-spec.md#apiv1alpha1mcpserver) -
Reference for the `MCPServer` Custom Resource Definition (CRD)
- [vMCP scaling and performance](../guides-vmcp/scaling-and-performance.mdx) -
Scale Virtual MCP Server deployments
- [Deploy the operator](./deploy-operator.mdx) - Install the ToolHive operator
- [Build MCP containers](../guides-cli/build-containers.mdx) - Create custom MCP
server container images
Expand Down
76 changes: 68 additions & 8 deletions docs/toolhive/guides-vmcp/composite-tools.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ backend MCP servers, handling dependencies and collecting results.
wait for their prerequisites
- **Template expansion**: Dynamic arguments using step outputs
- **Elicitation**: Request user input mid-workflow (approval gates, choices)
- **Iteration**: Loop over collections with forEach steps
- **Error handling**: Configurable abort, continue, or retry behavior
- **Timeouts**: Workflow and per-step timeout configuration

Expand Down Expand Up @@ -290,7 +291,7 @@ spec:

### Steps

Each step can be a tool call or an elicitation:
Each step can be a tool call, an elicitation, or a forEach loop:

```yaml title="VirtualMCPServer resource"
spec:
Expand Down Expand Up @@ -344,6 +345,62 @@ spec:
timeout: '5m'
```

### forEach steps

Iterate over a collection from a previous step's output and execute a tool call
for each item:

```yaml title="VirtualMCPServer resource"
spec:
config:
compositeTools:
- name: scan_repositories
description: Check each repository for security advisories
parameters:
type: object
properties:
org:
type: string
required:
- org
steps:
- id: list_repos
tool: github_list_repos
arguments:
org: '{{.params.org}}'
# highlight-start
- id: check_advisories
type: forEach
collection: '{{json .steps.list_repos.output.repositories}}'
itemVar: repo
maxParallel: 5
step:
type: tool
tool: github_list_security_advisories
arguments:
repo: '{{.forEach.repo.name}}'
onError:
action: continue
dependsOn: [list_repos]
# highlight-end
```

**forEach fields:**

| Field | Description | Default |
| --------------- | ----------------------------------------------------- | ------- |
| `collection` | Template expression that produces an array | — |
| `itemVar` | Variable name for the current item | item |
| `maxParallel` | Maximum concurrent iterations (max 50) | 10 |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maxParallel fan-out is per-pod, not distributed across replicas

The maxParallel: 50 cap may give the impression that concurrency is spread across the vMCP deployment. In practice, all iterations of a forEach step are dispatched by the single pod handling that composite tool request — so 50 parallel backend calls will originate from one pod regardless of spec.replicas. This is worth noting so users size their backend MCP servers and pod resources against the per-pod fan-out, not the aggregate deployment capacity.

Copy link
Copy Markdown
Contributor

@yrobla yrobla Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed by workflow_engine.go:701-741: all iterations run in an errgroup on the pod that received the composite tool request — no distribution across replicas.

The note fits right after the template access paragraph (line 402), before ### Error handling:

Suggested change
| `maxParallel` | Maximum concurrent iterations (max 50) | 10 |
`maxParallel` controls how many iterations run concurrently **on the pod that
received the composite tool request**. Iterations are not distributed across
vMCP replicas — all parallel backend calls originate from a single pod
regardless of `spec.replicas`.
When sizing your deployment, account for the per-pod fan-out: a `maxParallel: 50`
forEach step can open up to 50 simultaneous connections to backend MCP servers
from one pod. Ensure both the vMCP pod resources and the backend MCP servers
can handle that per-pod concurrency.
:::

| `maxIterations` | Maximum total iterations (max 1000) | 100 |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No guidance on timeout interaction with maxIterations and maxParallel

At maxIterations: 1000 and maxParallel: 10 (default), a forEach loop could run for 100 serial batches — if each backend call takes a few seconds, the total can easily exceed a workflow-level timeout. The doc covers per-step timeouts elsewhere but doesn't connect the two here. A brief callout would help users avoid silent truncation: e.g., workflow timeout should be at least ceil(maxIterations / maxParallel) × expected step duration.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can take the scalability changes

| `step` | Inner step definition (tool call to execute per item) | — |
| `onError` | Error handling: `abort` (stop) or `continue` (skip) | abort |

Access the current item inside the inner step using
`{{.forEach.<itemVar>.<field>}}`. In the example above, `{{.forEach.repo.name}}`
accesses the `name` field of the current repository. You can also use
`{{.forEach.index}}` to access the zero-based iteration index.

### Error handling

Configure behavior when steps fail:
Expand Down Expand Up @@ -507,13 +564,16 @@ without defaultResults defined

Access workflow context in arguments:

| Template | Description |
| --------------------------- | ------------------------------------------ |
| `{{.params.name}}` | Input parameter |
| `{{.steps.id.output}}` | Step output (map) |
| `{{.steps.id.output.text}}` | Text content from step output |
| `{{.steps.id.content}}` | Elicitation response content |
| `{{.steps.id.action}}` | Elicitation action (accept/decline/cancel) |
| Template | Description |
| -------------------------------- | ------------------------------------------ |
| `{{.params.name}}` | Input parameter |
| `{{.steps.id.output}}` | Step output (map) |
| `{{.steps.id.output.text}}` | Text content from step output |
| `{{.steps.id.content}}` | Elicitation response content |
| `{{.steps.id.action}}` | Elicitation action (accept/decline/cancel) |
| `{{.forEach.<itemVar>}}` | Current forEach item |
| `{{.forEach.<itemVar>.<field>}}` | Field on current forEach item |
| `{{.forEach.index}}` | Zero-based iteration index |

### Template functions

Expand Down
63 changes: 51 additions & 12 deletions docs/toolhive/guides-vmcp/scaling-and-performance.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,10 @@ description:
How to scale Virtual MCP Server deployments vertically and horizontally.
---

This guide explains how to scale Virtual MCP Server (vMCP) deployments.
This guide explains how to scale Virtual MCP Server (vMCP) deployments. For
MCPServer scaling, see
[Horizontal scaling](../guides-k8s/run-mcp-k8s.mdx#horizontal-scaling) in the
Kubernetes operator guide.

## Vertical scaling

Expand Down Expand Up @@ -37,24 +40,60 @@ higher request volumes.

### How to scale horizontally

The VirtualMCPServer CRD does not have a `replicas` field. The operator creates
a Deployment named `vmcp-<NAME>` (where `<NAME>` is your VirtualMCPServer name)
with 1 replica and preserves the replicas count, allowing you to manage scaling
separately.
Set the `replicas` field in your VirtualMCPServer spec to control the number of
vMCP pods:

```yaml title="VirtualMCPServer resource"
spec:
replicas: 3
```

If you omit `replicas`, the operator defers replica management to an HPA or
other external controller. You can also scale manually or with an HPA:

**Option 1: Manual scaling**

```bash
kubectl scale deployment vmcp-<vmcp-name> -n <NAMESPACE> --replicas=3
kubectl scale deployment vmcp-<VMCP_NAME> -n <NAMESPACE> --replicas=3
```

**Option 2: Autoscaling with HPA**

```bash
kubectl autoscale deployment vmcp-<vmcp-name> -n <NAMESPACE> \
kubectl autoscale deployment vmcp-<VMCP_NAME> -n <NAMESPACE> \
--min=2 --max=5 --cpu-percent=70
```

### Session storage for multi-replica deployments

When running multiple replicas, configure Redis session storage so that sessions
are shared across pods. Without session storage, a request routed to a different
replica than the one that established the session will fail.

```yaml title="VirtualMCPServer resource"
spec:
replicas: 3
sessionStorage:
provider: redis
address: redis-master.toolhive-system.svc.cluster.local:6379
db: 0
keyPrefix: vmcp-sessions
passwordRef:
name: redis-secret
key: password
```

See [Redis Sentinel session storage](../guides-k8s/redis-session-storage.mdx)
for a complete Redis deployment guide.

:::warning

If you configure multiple replicas without session storage, the operator sets a
`SessionStorageMissingForReplicas` status condition on the resource. Ensure
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Advisory-only: the operator does not block the scale-out

I checked the operator source (virtualmcpserver_controller.go:355 and the equivalent in mcpserver_controller.go:1962). Both validateSessionStorageForReplicas functions are explicitly annotated as // Advisory: — they only set the SessionStorageWarning status condition and reconciliation continues normally. The deployment is created/updated with the requested replica count regardless.

The current wording is accurate but incomplete — readers may assume the condition is a rejection. It should be explicit that the operator still applies the replica count:

Suggested change
`SessionStorageMissingForReplicas` status condition on the resource. Ensure
`SessionStorageMissingForReplicas` status condition on the resource but **still
applies the replica count**. Pods will start, but requests routed to a replica
that did not establish the session will fail. Ensure

One additional nit: the condition type shown in kubectl describe is SessionStorageWarning; SessionStorageMissingForReplicas is the reason. Users trying to watch or filter on this condition in scripts or alerts will need the type, not the reason — worth clarifying or using the type as the primary reference.

Redis is available before scaling beyond a single replica.

:::

### When horizontal scaling is challenging

Horizontal scaling works well for **stateless backends** (fetch, search,
Expand All @@ -63,22 +102,22 @@ read-only operations) where sessions can be resumed on any instance.
However, **stateful backends** make horizontal scaling difficult:

- **Stateful backends** (Playwright browser sessions, database connections, file
system operations) require requests to be routed to the same vMCP instance
that established the session
system operations) require requests to be routed to the same instance that
established the session
- Session resumption may not work reliably for stateful backends

The `VirtualMCPServer` CRD includes a `sessionAffinity` field that controls how
the Kubernetes Service routes repeated client connections. By default, it uses
`ClientIP` affinity, which routes connections from the same client IP to the
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ClientIP affinity is unreliable behind NAT or load balancers

The doc presents ClientIP as the default sticky-session mechanism but doesn't mention its well-known limitation: when clients sit behind a NAT gateway, corporate proxy, or cloud load balancer, all traffic appears to originate from the same IP, routing everyone to the same pod and defeating horizontal scaling. For users deploying in typical cloud environments (EKS, GKE, AKS), this will silently underperform. Worth adding a caveat, e.g.:

Note: ClientIP affinity may be ineffective when clients share a NAT or egress IP. For more reliable session routing, consider stateless backends or a dedicated vMCP instance per team.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CRD confirms sessionAffinity accepts ClientIP or None, and the field comment (virtualmcpserver_types.go:41) already notes: "Set to None for stateless servers or when using an external load balancer with its own affinity." — worth surfacing in the doc.

Suggested replacement for the sessionAffinity paragraph:

Suggested change
`ClientIP` affinity, which routes connections from the same client IP to the
The `VirtualMCPServer` CRD includes a `sessionAffinity` field that controls how
the Kubernetes Service routes repeated client connections. By default, it uses
`ClientIP` affinity, which routes connections from the same client IP to the
same pod:
```yaml
spec:
sessionAffinity: ClientIP # default

:::warning[ClientIP affinity is unreliable behind NAT or shared egress IPs]

ClientIP affinity relies on the source IP reaching kube-proxy. When clients
sit behind a NAT gateway, corporate proxy, or cloud load balancer (common in
EKS, GKE, and AKS), all traffic appears to originate from the same IP —
routing every client to the same pod and eliminating the benefit of horizontal
scaling. This fails silently: the deployment appears healthy but only one pod
handles all load.

For stateless backends, disable affinity so the Service load-balances freely:

spec:
  sessionAffinity: None

For stateful backends where true per-session routing is required, ClientIP
affinity is a best-effort mechanism only. Prefer vertical scaling or a
dedicated vMCP instance per team instead.

:::

same pod. You can configure this using the `sessionAffinity` field:
same pod:

```yaml
spec:
sessionAffinity: ClientIP # default
```

For stateful backends, vertical scaling or dedicated vMCP instances per team/use
case are recommended instead of horizontal scaling.
For stateful backends, vertical scaling or dedicated instances per team/use case
are recommended instead of horizontal scaling.

## Next steps

Expand Down
36 changes: 36 additions & 0 deletions docs/toolhive/guides-vmcp/tool-aggregation.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,42 @@ spec:
description: 'Create a new GitHub issue in the repository'
```

### Annotation overrides

Override MCP tool annotations to provide hints to LLM clients about tool
behavior. Annotations are optional—only set the fields you want to override:

```yaml title="VirtualMCPServer resource"
spec:
config:
aggregation:
tools:
- workload: github
overrides:
delete_repository:
annotations:
destructiveHint: true
readOnlyHint: false
list_issues:
annotations:
title: 'List GitHub Issues'
readOnlyHint: true
idempotentHint: true
```

**Available annotation fields:**

| Field | Type | Description |
| ----------------- | ------- | -------------------------------------------------- |
| `title` | string | Display title for the tool in MCP clients |
| `readOnlyHint` | boolean | Indicates the tool does not modify data |
| `destructiveHint` | boolean | Indicates the tool may delete or overwrite data |
| `idempotentHint` | boolean | Indicates repeated calls produce the same result |
| `openWorldHint` | boolean | Indicates the tool interacts with external systems |

Annotation overrides can be combined with name and description overrides on the
same tool.

:::info

You can also reference an `MCPToolConfig` resource using `toolConfigRef` instead
Expand Down