RFC-0057: MCPRemoteEndpoint CRD — Unified Remote MCP Server Connectivity#55
RFC-0057: MCPRemoteEndpoint CRD — Unified Remote MCP Server Connectivity#55
Conversation
Introduces a new MCPServerEntry CRD that lets VirtualMCPServer connect directly to remote MCP servers without MCPRemoteProxy infrastructure, resolving the forced-auth (#3104) and dual-boundary confusion (#4109) issues. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
RFC should focus on design intent, not implementation code. Keep YAML/Mermaid examples, replace Go blocks with prose describing controller behavior, discovery logic, and TLS handling. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implementation details like specific file paths belong in the implementation, not the RFC design document. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
yrobla
left a comment
There was a problem hiding this comment.
This is a well-structured RFC with clear motivation, thorough security considerations, and good alternatives analysis. The naming convention (following Istio ServiceEntry) is a nice touch. A few issues worth addressing before implementation:
groupReftype inconsistency (string vs object pattern) needs resolutioncaBundleRefresource type (Secret vs ConfigMap) is ambiguous across sections- SSRF mitigation has a gap: HTTPS-only validation doesn't block private IP ranges
- Auth resolution timing in
ListWorkloadsInGroup()description is unclear - Static mode CA bundle propagation is unspecified
toolConfigRefdeferral creates an unacknowledged migration regression for MCPRemoteProxy users
- Clarify groupRef is plain string for consistency with MCPServer/MCPRemoteProxy - Fix Alt 1 YAML example to use string form for groupRef - Change caBundleRef to reference ConfigMap (CA certs are public data) - Add SSRF rationale: CEL IP blocking omitted since internal servers are legitimate - Clarify auth resolution loads config only, token exchange deferred to request time - Specify CA bundle volume mount for static mode (PEM files, not env vars) - Document toolConfigRef migration path via aggregation.tools[].workload Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
yrobla
left a comment
There was a problem hiding this comment.
looks as a very good improvement to me
amirejaz
left a comment
There was a problem hiding this comment.
This looks good to me. Added one comment
| metadata: | ||
| name: context7 | ||
| spec: | ||
| remoteURL: https://mcp.context7.com/mcp |
There was a problem hiding this comment.
Nit: could we use a real public MCP server here, such as mcp-spec? context7 is currently being used as the example for both private and public MCP servers, which feels a bit confusing.
|
Question: Would it make sense to rename |
jerm-dro
left a comment
There was a problem hiding this comment.
Looks great! I appreciate the alternatives section.
|
We need this!! I love it. |
| @@ -0,0 +1,904 @@ | |||
| # RFC-0055: MCPServerEntry CRD for Direct Remote MCP Server Backends | |||
There was a problem hiding this comment.
(naming): if it's for remote MCPs only, why not "MCPRemoteServer"? just by the "MCPServerEntry" it wasn't clear to me it was for remote only
There was a problem hiding this comment.
Thanks for the detailed RFC — the three problems are real and worth solving, and the security thinking (no operator-side probing, HTTPS enforcement, credential separation) is solid throughout.
The core question this review raises is: does solving these three problems require a new CRD, or can they be addressed by extending what already exists?
After reading the RFC against the actual codebase, the case for a new CRD is weaker than it appears:
- Problem #1 (forced OIDC on public remotes) can be fixed by making
oidcConfigoptional onMCPRemoteProxy— a one-field change (see comment 2). - Problem #2 (dual auth boundary confusion) turns out not to exist in the code —
externalAuthConfigRefandoidcConfigalready serve separate purposes inpkg/vmcp/workloads/k8s.go(see comment 1). - Problem #3 (pod waste) can be solved by adding a
direct: trueflag toMCPRemoteProxythat skipsensureDeployment()andensureService()and usesremoteURLas the backend URL directly — touching ~4 files vs. the RFC's ~20 (see comments 3, 4).
The RFC does consider alternatives but doesn't seriously evaluate the most natural one: extending MCPRemoteProxy itself. Its alternatives section argues against extending MCPServer (fair) and against config-only approaches (fair), but never evaluates a lightweight mode on the resource that already exists for this exact purpose (comment 3).
Beyond the alternatives analysis, there are several concrete concerns: the operational cost of a new CRD is significantly underestimated (comment 4); the new CRD creates naming confusion alongside MCPServer and MCPRemoteProxy (comment 9); the MCPGroup status model would need restructuring for a third backend type (comment 5); the CA bundle volume-mounting introduces new operator complexity (comment 8); and the SSRF trade-off of moving outbound calls into the vMCP pod deserves explicit acknowledgement (comment 7).
Suggested path forward: Before investing in a new CRD, implement the simpler approach — make oidcConfig optional and add direct: true to MCPRemoteProxy. If real-world usage reveals that the RBAC separation story (platform teams vs. product teams managing backends independently) is a genuine need that can't be satisfied by MCPRemoteProxy's existing RBAC surface, revisit the CRD proposal with that as the primary motivation rather than the pod-waste argument.
If the team decides a new CRD is still the right call after considering the above, comments 5–12 cover the design-level issues that should be resolved before implementation begins.
🤖 Review assisted by Claude Code
|
+1 to what @ChrisJBurns said, I think we should consider reusing the existing CRDs and be more explicit in why it's not an option. There's also the cognitive load on the users, the documentation load etc.. |
I actually disagree with this. I think re-using existing CRDs would do us a diservice and further increase technical depth by using APIs that are not fit for purpose because they had very different intentions. |
Signed-off-by: Chris Burns <29541485+ChrisJBurns@users.noreply.github.com>
Signed-off-by: Chris Burns <29541485+ChrisJBurns@users.noreply.github.com>
c2965b7
Address all review feedback from specialized agent review: Critical fixes: - Add 3 CEL validation rules (proxy requires proxyConfig, proxyConfig requires oidcConfig, direct rejects proxyConfig) - Add spec.type immutability guard to prevent orphaned resources - Add Phase 0.5 for MCPRemoteProxy controller refactoring prerequisite - Document StaticBackendConfig Type field gap and KnownFields(true) risk - Document full RFC 8693 token exchange flow with subject token provenance - Document vMCP-to-proxy pod token forwarding with MCP auth spec rationale - Add session constraints for direct mode (single-replica requirement) Significant fixes: - Fix Phase 0 deprecation mechanism (Warning events, not deprecatedversion) - Make MCPGroup status field changes additive (not breaking rename) - Document name collision handling and WorkloadType enum extension - Document field indexer registration requirement - Strengthen SSRF mitigation (NetworkPolicy REQUIRED for IMDS/cluster API) - Add credential blast radius to threat model - Mark SSE as deprecated per MCP spec 2025-11-25 - Add MCP-Protocol-Version header injection requirement - Add reconnection handling section for direct mode - Expand MCPGroup controller changes to 9 explicit code changes - Fix Open Question 4 contradiction with groupRef: Required Documentation and polish: - Add mode selection guide, CRD short names, printer columns - Add allow-insecure annotation specification - Fix THV-0055 broken link, add spec.port to migration table - Add actionable deprecation timeline, standalone use explanation - Add RFC naming convention note Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix CEL validation rule placement (struct-level, not field-level) - Add oldSelf null guard to type immutability rule - Correct auth flow: accurately describe dual-consumer externalAuthConfigRef in proxy mode and single-boundary direct mode - Remove incorrect Redis session store claim; single-replica is the only supported constraint for type:direct - Fix reconnection to require full MCP initialization handshake on HTTP 404, with Last-Event-ID resumption attempt before re-init - Add server-initiated notifications section (persistent GET stream) - Restrict embeddedAuthServer and awsSts for type:direct - Enumerate all MCPGroup controller changes including field indexer and RBAC markers - Remove false audience validation claim; add as Phase 2 requirement - Fix broken anchor cross-references - Document HTTPS enforcement as controller-side only - Specify allow-insecure annotation value - Add inline warning to addPlaintextHeaders YAML example - Add CA bundle MITM trust store warning - Add emergency credential rotation guidance - Remove remoteendpoint short name; keep mcpre only Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Remove the HTTPS enforcement requirement and the toolhive.stacklok.dev/allow-insecure annotation section. remoteURL is now simply "URL of the remote MCP server". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Chris Burns <29541485+ChrisJBurns@users.noreply.github.com>
Signed-off-by: Chris Burns <29541485+ChrisJBurns@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Generalize mode selection guide to cover all auth types, not just token exchange - Reword rule of thumb to use "fronted by vMCP" vs "standalone" framing - Replace ASCII auth flow diagrams with mermaid sequence diagrams - Clarify dual-consumer auth behavior per mode - Remove implementation-level token lifetime detail - Add groupRef rationale inline comment - Replace name collision fallback with admission-time rejection - Replace CEL oidcConfig rule with standard kubebuilder Required marker Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Use SecretKeyRef env vars on the vMCP Deployment instead of inlining secret values into the backend ConfigMap. Mirrors the existing pattern from MCPRemoteProxy. ConfigMap stores only env var names, vMCP resolves values at runtime via secrets.EnvironmentProvider. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Introduces
MCPRemoteEndpoint, a new CRD that unifies remote MCP server connectivity under a single resource with two explicit modes:type: proxy— Deploys a proxy pod with full auth middleware, authz policy, and audit logging. Functionally replacesMCPRemoteProxy.type: direct— No pod deployed; VirtualMCPServer connects directly to the remote URL. Resolves forced-auth on public remotes (#3104) and eliminates unnecessary infrastructure for simple remote backends.This supersedes THV-0055 (MCPServerEntry CRD).
Motivation
vMCP currently requires MCPRemoteProxy (which spawns
thv-proxyrunnerpods) to reach remote MCP servers. This creates three problems:externalAuthConfigRefserves two conflicting purposes (vMCP-to-proxy AND proxy-to-remote)Design Highlights
type: proxy(full auth middleware, replaces MCPRemoteProxy) andtype: direct(zero pods, zero services)type: direct)MCPRemoteEndpointwithtype: proxy; MCPRemoteProxy receives no new featurestype: directsupportstokenExchangeviaexternalAuthConfigReffor backends requiring OAuth credentialsRelated RFCs
Test plan