Skip to content

RFC-0057: MCPRemoteEndpoint CRD — Unified Remote MCP Server Connectivity#55

Open
JAORMX wants to merge 15 commits intomainfrom
jaosorior/mcpserverentry-direct-remote-backends
Open

RFC-0057: MCPRemoteEndpoint CRD — Unified Remote MCP Server Connectivity#55
JAORMX wants to merge 15 commits intomainfrom
jaosorior/mcpserverentry-direct-remote-backends

Conversation

@JAORMX
Copy link
Copy Markdown
Contributor

@JAORMX JAORMX commented Mar 12, 2026

Summary

Introduces MCPRemoteEndpoint, a new CRD that unifies remote MCP server connectivity under a single resource with two explicit modes:

  • type: proxy — Deploys a proxy pod with full auth middleware, authz policy, and audit logging. Functionally replaces MCPRemoteProxy.
  • type: direct — No pod deployed; VirtualMCPServer connects directly to the remote URL. Resolves forced-auth on public remotes (#3104) and eliminates unnecessary infrastructure for simple remote backends.

This supersedes THV-0055 (MCPServerEntry CRD).

Motivation

vMCP currently requires MCPRemoteProxy (which spawns thv-proxyrunner pods) to reach remote MCP servers. This creates three problems:

  1. Forced auth on public remotes: MCPRemoteProxy requires OIDC even when vMCP already handles client auth
  2. Dual boundary confusion: externalAuthConfigRef serves two conflicting purposes (vMCP-to-proxy AND proxy-to-remote)
  3. Resource waste: Every remote backend needs a Deployment + Service + Pod just to make an HTTP call

Design Highlights

  • Two explicit modes: type: proxy (full auth middleware, replaces MCPRemoteProxy) and type: direct (zero pods, zero services)
  • CEL validation: Mode-specific field constraints enforced at admission time (e.g., OIDC/telemetry config rejected for type: direct)
  • Validation-only controller for direct mode: No infrastructure created, no remote URL probing
  • Both discovery modes: Works with static (operator-generated ConfigMap) and dynamic (vMCP runtime discovery) from THV-0014
  • MCPRemoteProxy deprecated: Migrated to MCPRemoteEndpoint with type: proxy; MCPRemoteProxy receives no new features
  • Token exchange support: type: direct supports tokenExchange via externalAuthConfigRef for backends requiring OAuth credentials

Related RFCs

  • THV-0008 (Virtual MCP Server)
  • THV-0009 (Remote MCP Proxy)
  • THV-0010 (MCPGroup CRD)
  • THV-0014 (K8s-Aware vMCP)
  • THV-0023 (CRD v1alpha2 Optimization — config refs target MCPRemoteEndpoint)
  • THV-0026 (Header Passthrough)

Test plan

  • Review RFC structure and completeness
  • Validate CRD design against existing patterns
  • Verify mode-specific CEL validation rules
  • Verify security considerations (auth boundaries, token exchange restrictions)
  • Confirm MCPRemoteProxy migration path is documented

JAORMX and others added 4 commits March 12, 2026 09:03
Introduces a new MCPServerEntry CRD that lets VirtualMCPServer connect
directly to remote MCP servers without MCPRemoteProxy infrastructure,
resolving the forced-auth (#3104) and dual-boundary confusion (#4109)
issues.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
RFC should focus on design intent, not implementation code.
Keep YAML/Mermaid examples, replace Go blocks with prose
describing controller behavior, discovery logic, and TLS
handling.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implementation details like specific file paths belong in
the implementation, not the RFC design document.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@yrobla yrobla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a well-structured RFC with clear motivation, thorough security considerations, and good alternatives analysis. The naming convention (following Istio ServiceEntry) is a nice touch. A few issues worth addressing before implementation:

  • groupRef type inconsistency (string vs object pattern) needs resolution
  • caBundleRef resource type (Secret vs ConfigMap) is ambiguous across sections
  • SSRF mitigation has a gap: HTTPS-only validation doesn't block private IP ranges
  • Auth resolution timing in ListWorkloadsInGroup() description is unclear
  • Static mode CA bundle propagation is unspecified
  • toolConfigRef deferral creates an unacknowledged migration regression for MCPRemoteProxy users

- Clarify groupRef is plain string for consistency with MCPServer/MCPRemoteProxy
- Fix Alt 1 YAML example to use string form for groupRef
- Change caBundleRef to reference ConfigMap (CA certs are public data)
- Add SSRF rationale: CEL IP blocking omitted since internal servers are legitimate
- Clarify auth resolution loads config only, token exchange deferred to request time
- Specify CA bundle volume mount for static mode (PEM files, not env vars)
- Document toolConfigRef migration path via aggregation.tools[].workload

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
yrobla
yrobla previously approved these changes Mar 12, 2026
Copy link
Copy Markdown
Contributor

@yrobla yrobla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks as a very good improvement to me

amirejaz
amirejaz previously approved these changes Mar 12, 2026
Copy link
Copy Markdown
Contributor

@amirejaz amirejaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Added one comment

metadata:
name: context7
spec:
remoteURL: https://mcp.context7.com/mcp
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: could we use a real public MCP server here, such as mcp-spec? context7 is currently being used as the example for both private and public MCP servers, which feels a bit confusing.

@amirejaz
Copy link
Copy Markdown
Contributor

Question: Would it make sense to rename MCPServerEntry to MCPRemoteServerEntry to improve clarity?

jerm-dro
jerm-dro previously approved these changes Mar 13, 2026
Copy link
Copy Markdown
Contributor

@jerm-dro jerm-dro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I appreciate the alternatives section.

@lorr1
Copy link
Copy Markdown

lorr1 commented Mar 17, 2026

We need this!! I love it.

@@ -0,0 +1,904 @@
# RFC-0055: MCPServerEntry CRD for Direct Remote MCP Server Backends
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(naming): if it's for remote MCPs only, why not "MCPRemoteServer"? just by the "MCPServerEntry" it wasn't clear to me it was for remote only

Copy link
Copy Markdown
Contributor

@ChrisJBurns ChrisJBurns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed RFC — the three problems are real and worth solving, and the security thinking (no operator-side probing, HTTPS enforcement, credential separation) is solid throughout.

The core question this review raises is: does solving these three problems require a new CRD, or can they be addressed by extending what already exists?

After reading the RFC against the actual codebase, the case for a new CRD is weaker than it appears:

  • Problem #1 (forced OIDC on public remotes) can be fixed by making oidcConfig optional on MCPRemoteProxy — a one-field change (see comment 2).
  • Problem #2 (dual auth boundary confusion) turns out not to exist in the code — externalAuthConfigRef and oidcConfig already serve separate purposes in pkg/vmcp/workloads/k8s.go (see comment 1).
  • Problem #3 (pod waste) can be solved by adding a direct: true flag to MCPRemoteProxy that skips ensureDeployment() and ensureService() and uses remoteURL as the backend URL directly — touching ~4 files vs. the RFC's ~20 (see comments 3, 4).

The RFC does consider alternatives but doesn't seriously evaluate the most natural one: extending MCPRemoteProxy itself. Its alternatives section argues against extending MCPServer (fair) and against config-only approaches (fair), but never evaluates a lightweight mode on the resource that already exists for this exact purpose (comment 3).

Beyond the alternatives analysis, there are several concrete concerns: the operational cost of a new CRD is significantly underestimated (comment 4); the new CRD creates naming confusion alongside MCPServer and MCPRemoteProxy (comment 9); the MCPGroup status model would need restructuring for a third backend type (comment 5); the CA bundle volume-mounting introduces new operator complexity (comment 8); and the SSRF trade-off of moving outbound calls into the vMCP pod deserves explicit acknowledgement (comment 7).

Suggested path forward: Before investing in a new CRD, implement the simpler approach — make oidcConfig optional and add direct: true to MCPRemoteProxy. If real-world usage reveals that the RBAC separation story (platform teams vs. product teams managing backends independently) is a genuine need that can't be satisfied by MCPRemoteProxy's existing RBAC surface, revisit the CRD proposal with that as the primary motivation rather than the pod-waste argument.

If the team decides a new CRD is still the right call after considering the above, comments 5–12 cover the design-level issues that should be resolved before implementation begins.

🤖 Review assisted by Claude Code

@jhrozek
Copy link
Copy Markdown
Contributor

jhrozek commented Mar 17, 2026

+1 to what @ChrisJBurns said, I think we should consider reusing the existing CRDs and be more explicit in why it's not an option. There's also the cognitive load on the users, the documentation load etc..

@JAORMX
Copy link
Copy Markdown
Contributor Author

JAORMX commented Mar 18, 2026

+1 to what @ChrisJBurns said, I think we should consider reusing the existing CRDs and be more explicit in why it's not an option. There's also the cognitive load on the users, the documentation load etc..

I actually disagree with this. I think re-using existing CRDs would do us a diservice and further increase technical depth by using APIs that are not fit for purpose because they had very different intentions.

Signed-off-by: Chris Burns <29541485+ChrisJBurns@users.noreply.github.com>
Signed-off-by: Chris Burns <29541485+ChrisJBurns@users.noreply.github.com>
@ChrisJBurns ChrisJBurns dismissed stale reviews from jerm-dro, amirejaz, and yrobla via c2965b7 March 18, 2026 17:09
ChrisJBurns and others added 4 commits March 18, 2026 18:03
Address all review feedback from specialized agent review:

Critical fixes:
- Add 3 CEL validation rules (proxy requires proxyConfig, proxyConfig
  requires oidcConfig, direct rejects proxyConfig)
- Add spec.type immutability guard to prevent orphaned resources
- Add Phase 0.5 for MCPRemoteProxy controller refactoring prerequisite
- Document StaticBackendConfig Type field gap and KnownFields(true) risk
- Document full RFC 8693 token exchange flow with subject token provenance
- Document vMCP-to-proxy pod token forwarding with MCP auth spec rationale
- Add session constraints for direct mode (single-replica requirement)

Significant fixes:
- Fix Phase 0 deprecation mechanism (Warning events, not deprecatedversion)
- Make MCPGroup status field changes additive (not breaking rename)
- Document name collision handling and WorkloadType enum extension
- Document field indexer registration requirement
- Strengthen SSRF mitigation (NetworkPolicy REQUIRED for IMDS/cluster API)
- Add credential blast radius to threat model
- Mark SSE as deprecated per MCP spec 2025-11-25
- Add MCP-Protocol-Version header injection requirement
- Add reconnection handling section for direct mode
- Expand MCPGroup controller changes to 9 explicit code changes
- Fix Open Question 4 contradiction with groupRef: Required

Documentation and polish:
- Add mode selection guide, CRD short names, printer columns
- Add allow-insecure annotation specification
- Fix THV-0055 broken link, add spec.port to migration table
- Add actionable deprecation timeline, standalone use explanation
- Add RFC naming convention note

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix CEL validation rule placement (struct-level, not field-level)
- Add oldSelf null guard to type immutability rule
- Correct auth flow: accurately describe dual-consumer externalAuthConfigRef
  in proxy mode and single-boundary direct mode
- Remove incorrect Redis session store claim; single-replica is the only
  supported constraint for type:direct
- Fix reconnection to require full MCP initialization handshake on HTTP 404,
  with Last-Event-ID resumption attempt before re-init
- Add server-initiated notifications section (persistent GET stream)
- Restrict embeddedAuthServer and awsSts for type:direct
- Enumerate all MCPGroup controller changes including field indexer and RBAC markers
- Remove false audience validation claim; add as Phase 2 requirement
- Fix broken anchor cross-references
- Document HTTPS enforcement as controller-side only
- Specify allow-insecure annotation value
- Add inline warning to addPlaintextHeaders YAML example
- Add CA bundle MITM trust store warning
- Add emergency credential rotation guidance
- Remove remoteendpoint short name; keep mcpre only

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Remove the HTTPS enforcement requirement and the
toolhive.stacklok.dev/allow-insecure annotation section.
remoteURL is now simply "URL of the remote MCP server".

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Chris Burns <29541485+ChrisJBurns@users.noreply.github.com>
@ChrisJBurns ChrisJBurns changed the title RFC: MCPServerEntry CRD for direct remote MCP server backends RFC-0057: MCPRemoteEndpoint CRD — Unified Remote MCP Server Connectivity Mar 19, 2026
ChrisJBurns and others added 2 commits March 19, 2026 16:06
Signed-off-by: Chris Burns <29541485+ChrisJBurns@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ChrisJBurns and others added 2 commits March 20, 2026 15:31
- Generalize mode selection guide to cover all auth types, not just token exchange
- Reword rule of thumb to use "fronted by vMCP" vs "standalone" framing
- Replace ASCII auth flow diagrams with mermaid sequence diagrams
- Clarify dual-consumer auth behavior per mode
- Remove implementation-level token lifetime detail
- Add groupRef rationale inline comment
- Replace name collision fallback with admission-time rejection
- Replace CEL oidcConfig rule with standard kubebuilder Required marker

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use SecretKeyRef env vars on the vMCP Deployment instead of inlining
secret values into the backend ConfigMap. Mirrors the existing pattern
from MCPRemoteProxy. ConfigMap stores only env var names, vMCP resolves
values at runtime via secrets.EnvironmentProvider.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants