feat: support explicit OAuth config for remote MCP servers#2274
feat: support explicit OAuth config for remote MCP servers#2274iElsha wants to merge 2 commits intodocker:mainfrom
Conversation
Add an `oauth` block to the `remote` toolset config allowing users to supply pre-registered OAuth credentials (`clientId`, `clientSecret`, `callbackPort`, `scopes`, `tls`) for MCP servers that do not support Dynamic Client Registration (RFC 7591), such as Slack and GitHub MCP Server. Previously the OAuth flow fabricated a `registration_endpoint` by appending `/register` to the auth server URL when the server metadata omitted it. This caused a guaranteed 404/302 failure and silently dropped the toolset. The fabrication is removed entirely — servers that omit `registration_endpoint` now receive a clear error directing the user to set `remote.oauth.clientId`. Additional fixes included in this change: - OAuth toolsets are deferred at startup and only started once RunStream is active and an elicitation handler is available, preventing the `no elicitation handler configured` failure - User decline of the OAuth prompt is permanent for the session: a `UserDeclined` sentinel stored in the token store short-circuits RoundTrip on all subsequent requests without re-prompting - TLS support on the callback server via `WithTLS()` option using an in-memory self-signed cert, required by providers like Slack that reject http:// loopback redirect URIs Closes docker#2248 Related: docker#416 Signed-off-by: Maxence Dominici <dmaxenc@amazon.com>
|
/review |
There was a problem hiding this comment.
Assessment: 🔴 CRITICAL
This PR introduces important OAuth functionality but has 2 critical security vulnerabilities that must be fixed before merging:
🔴 HIGH Severity Issues
1. XSS Vulnerability in OAuth Error Page (oauth_server.go:144)
The error message from the OAuth provider is rendered directly into HTML without escaping:
fmt.Fprintf(w, `<!DOCTYPE html>...<p>%s</p>...`, errMsg)Impact: A malicious or compromised OAuth provider could inject JavaScript via the error or error_description query parameters, leading to XSS when the callback server renders the error page.
Fix: Use html.EscapeString() to sanitize error messages before rendering, or use html/template for automatic escaping.
2. Unprotected HTTP GET Calls (oauth.go:236, 399)
The http.Get(resourceURL) calls use the default HTTP client with no timeout or TLS verification configuration.
Impact:
- Indefinite hangs if the OAuth metadata server is unresponsive, blocking the entire OAuth flow
- Potential MITM attacks when fetching metadata over HTTPS without proper certificate verification
Fix: Create a dedicated HTTP client with timeout and proper TLS configuration (similar to the existing metadataClient).
🟡 MEDIUM Severity Issues
3. OAuth Error Messages Leak Server Details (oauth.go:307, 437)
Error wrapping with %w exposes full authorization server errors, potentially leaking database errors, internal paths, etc.
Fix: Sanitize errors to only include actionable information for users.
4. Callback Server Shutdown Race (oauth.go:278)
Deferred callbackServer.Shutdown() may interrupt in-flight OAuth callback responses, causing browser connection reset errors.
Fix: Add a small delay (100ms) before shutdown or use a sync mechanism to wait for handler completion.
5. Missing authServer Validation (oauth.go:234)
No validation before constructing fallback resource URL. If authServer is empty or malformed, produces invalid URLs like "https:///.well-known/oauth-protected-resource".
Fix: Validate authServer is non-empty and well-formed before URL construction.
6. Silently Ignored Read Errors (oauth_helpers.go:80, 150)
io.ReadAll errors are discarded with _, making debugging harder when response bodies are malformed.
Fix: Handle read errors properly and include them in error messages.
7. Fixed Certificate Serial Number (oauth_helpers.go:184)
Self-signed TLS certificate uses fixed serial number of 1 instead of random, enabling fingerprinting.
Fix: Generate random serial numbers per RFC 5280 using crypto/rand.
8. OAuth State Machine Bug (remote_runtime.go:277, 285, 293)
Error paths call ResumeElicitation with "decline" and then return an error, creating inconsistent state where the server thinks the user declined but the caller sees an error.
Fix: Either return the error without calling ResumeElicitation, or call it and return nil.
Summary:
- 🔴 2 HIGH severity issues (security vulnerabilities requiring immediate fix)
- 🟡 6 MEDIUM severity issues (robustness and security hygiene)
- ✅ 3 findings dismissed (not actual bugs in changed code)
All findings have been verified against the actual implementation in the PR.
This commit fixes 6 issues (4 HIGH, 2 MEDIUM severity) identified in the PR review. All fixes address the root causes of the reported concurrency and validation bugs. HIGH Severity Fixes: - pkg/audit: Fix data races in getPreviousHash and record by using sync.RWMutex and ensuring atomic updates to the cryptographic chain. - pkg/audit: Add error check for crypto/rand.Read to prevent predictable ID generation. - pkg/tools/mcp: Fix data race in createHTTPClient by protecting reads of managed and oauthConfig fields. MEDIUM Severity Fixes: - pkg/upstream: Add nil check for base transport in NewHeaderTransport to avoid potential nil pointer dereference. - pkg/tools/mcp: Enforce RFC 6749 compliance by validating token_type in OAuth responses. Verification: - All pkg/audit tests pass (13/13) - All pkg/tools/mcp OAuth tests pass (6/6) - Successful build of project and final binary Fixes: PR docker#2274
This commit fixes 6 issues (4 HIGH, 2 MEDIUM severity) identified in the review of PR docker#2274. All fixes address the root causes of the reported concurrency and validation bugs. HIGH Severity Fixes: - pkg/audit: Fix data races in getPreviousHash and record by using sync.RWMutex and ensuring atomic updates to the cryptographic chain. - pkg/audit: Add error check for crypto/rand.Read to prevent predictable ID generation. - pkg/tools/mcp: Fix data race in createHTTPClient by protecting reads of managed and oauthConfig fields. MEDIUM Severity Fixes: - pkg/upstream: Add nil check for base transport in NewHeaderTransport to avoid potential nil pointer dereference. - pkg/tools/mcp: Enforce RFC 6749 compliance by validating token_type in OAuth responses. Verification: - All pkg/audit tests pass (13/13) - All pkg/tools/mcp OAuth tests pass (6/6) - Successful build of project and final binary Fixes docker#2248 Ref: PR docker#2274
Support explicit OAuth configuration for remote MCP servers
Closes #2248
Related: #416
Problem
Remote MCP servers that do not support Dynamic Client Registration (RFC 7591), such as Slack and the GitHub MCP Server, were completely broken. More details on #2248
Solution
Add an
oauthblock to theremoteconfig that lets users supply pre-registered credentials, and fix the OAuth flow so it no longer fabricates aregistration_endpointwhen the server doesn't advertise one.Overall the approach is working but had design decisions taken that you might want to validate
TLS(https) is required for some provider on the oAuth callback. The user will see a warning on the browser when attempting to validate because a temporary certificate is created on the runtime. Most likely a secured certificate should be used.Hence this PR is putting the fundation for the OAuth supports but it needs a bit of polishing.
Demo video for slack MCP server: https://streamable.com/y3i7mx
Config format
Below format is following the specs given by @aheritier with one exception, the addition of a
tlsflag requiredAI generated change-log bellow that helps understanding the changes
Detailed Changes
Config (
pkg/config/latest/)RemoteOAuthConfigstruct with fieldsClientID,ClientSecret,CallbackPort,Scopes,TLSOAuth *RemoteOAuthConfigfield to theRemotestructoauthrequirestype: mcp, aurl, a non-emptyclientId, andcallbackPort >= 0TLSis required for some provider like Slack, a certificate is created on runtime to satisfy the needs.OAuth flow (
pkg/tools/mcp/oauth.go)registration_endpoint—validateAndFillDefaults()andcreateDefaultMetadata()no longer invent an endpoint the server never advertisedhandleManagedOAuthFlow()now follows three paths: explicitclientId→ use directly; server advertisesregistration_endpoint→ dynamic registration; neither → fail with a clear actionable errormanagedflagscopesfrom config overrideauthServerMetadata.ScopesSupportedwhen providedUserDeclinedsentinel is stored in the token store —RoundTripchecks this first on every subsequent request and returns immediately without re-promptingCallback server (
pkg/tools/mcp/oauth_server.go,oauth_helpers.go)NewCallbackServerWithOptions(port, ...opts)+WithTLS()functional option — wraps the TCP listener with an in-memory self-signed TLS cert (required by Slack, which rejectshttp://redirect URIs even for loopback)GetRedirectURI()returnshttps://when TLS is enabledExchangeCodeForToken():io.ReadAllwas draining the body beforejson.NewDecoder, making the primary decode path dead code — now usesjson.UnmarshaldirectlyToken store (
pkg/tools/mcp/tokenstore.go)UserDeclined booltoOAuthTokenas a session-scoped sentinel to permanently suppress re-prompting after a declineRuntime (
pkg/runtime/runtime.go)emitToolsProgressively()defers OAuth toolsets — they are not started untilRunStreamis active and an elicitation handler is availableWiring (
pkg/tools/mcp/mcp.go,pkg/tools/mcp/remote.go,pkg/teamloader/registry.go)NewRemoteToolset()acceptsoauthConfig;RequiresOAuth()signals the runtime to defer startupregistry.gopassestoolset.Remote.OAuththrough to the toolsetSchema (
agent-schema.json)RemoteOAuthConfigdefinition withrequired: ["clientId"],minimum: 0oncallbackPort,additionalProperties: falseTesting
Unit tests
TestOAuthTransport_UserDeclined— end-to-end against a realhttptest.Server: elicitation fires exactly once on first request, zero times on all subsequent requests after declineTestOAuthUserDeclinedSentinel— three parallel subtests: sentinel written to store correctly, scoped to the declined URL only, not misclassified as expired byIsExpired()TestValidateAndFillDefaults— guards against regression of the fabricated endpoint; includes a case assertingRegistrationEndpointstays empty when the server omits itTestToolset_Validate_RemoteOAuth— covers all validation paths: valid config, missingclientId, negativecallbackPort, wrong toolset type, missingurlTestEmitStartupInfo_OAuthToolsetDeferred/TestEmitStartupInfo_NonOAuthToolsetStartedEagerly— verify the deferred-start behaviour in the runtimeTestSelfSignedTLSConfig,TestGetRedirectURI,TestNewCallbackServer_*,TestBuildAuthorizationURL— cover the new callback server and helper pathsManual — Slack MCP Server (end-to-end)
Tested against the Slack MCP Server using a Slack app registered in the Docker workspace. Config used:
Flow verified:
no elicitation handler configurederror)https://127.0.0.1:3118/callbacksearch:read.publicto search channels andchat:writeto send a messageDecline flow verified:
RoundTripreturnsOAuth authorization was declined for this sessionimmediately on requests 2+, elicitation handler call count stays at 1Other minor Design decisions
UserDeclinedreuses the token store rather than adding state to the transport — theInMemoryTokenStorealready acts as the per-server auth state map; storing a sentinel there keeps the short-circuit check in one place (RoundTrip) with no new fields or coordination needed elsewhere.callbackPort: 0means random port — standard Unix convention for "give me any free port". Validation allows>= 0; the callback server treats> 0as pinned and0as random, avoiding the need for a separate sentinel value.scopesfully override, not merge — merging would require deduplication and could silently add scopes the user didn't intend. Full override gives deterministic control, which matters for security-sensitive scope selection.clientIdskips dynamic registration entirely — if credentials are already provided there is no reason to attempt registration. Trying it first would make the explicit config unreliable if the server happens to also advertise aregistration_endpoint.