Skip to content

[rmcp-client] Retain OAuth credentials after persistence failure#30090

Closed
stevenlee-oai wants to merge 13 commits into
dev/stevenlee/mcp-oauth-stack-4-shared-storesfrom
dev/stevenlee/mcp-oauth-stack-5-cancellation-safe-refresh
Closed

[rmcp-client] Retain OAuth credentials after persistence failure#30090
stevenlee-oai wants to merge 13 commits into
dev/stevenlee/mcp-oauth-stack-4-shared-storesfrom
dev/stevenlee/mcp-oauth-stack-5-cancellation-safe-refresh

Conversation

@stevenlee-oai

@stevenlee-oai stevenlee-oai commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Codex Thread 019edd6d-6f14-74e2-853c-345d1803d4a6

Important

This PR belongs to the superseded MCP OAuth stack. Please review and merge the replacement stack beginning with openai/codex#30292. This PR remains available only as historical/reference context.

Replacement review order:

  1. openai/codex#30292 — shared File/Secrets store locking
  2. openai/codex#30293 — authoritative serialized refresh
  3. openai/codex#30294 — Codex-owned transport refresh and one-shot 401 recovery
  4. openai/codex#30295 — login/logout transaction serialization
  5. openai/codex#30296 — diagnostic-only Auto store drift reporting

This is part 5 of an eight-PR stack that prevents concurrent MCP OAuth refreshers from replaying a rotating refresh token or overwriting newer credentials.

Review order

  1. openai/codex#29017 — refresh transaction and lifecycle-local store pinning
  2. openai/codex#29020 — authoritative reread after lock acquisition
  3. openai/codex#29019 — serialize login and logout with refresh
  4. openai/codex#29021 — lock aggregate File and Secrets stores
  5. openai/codex#30090 — retain refreshed credentials after a persistence failure (this PR)
  6. openai/codex#30091 — add Codex-owned proactive and 401 recovery
  7. openai/codex#29018 — make Codex the exclusive refresh owner for all RMCP traffic
  8. openai/codex#30089 — end-to-end transport recovery coverage

Why this layer exists

After the provider successfully rotates A to B, B is the only potentially usable credential even if durable persistence fails. Reinstalling durable A on the next operation would discard B and replay an already-consumed refresh token.

What changes

  • Make refreshed B authoritative in the authorization manager and the persistor's in-memory state before attempting the durable write.
  • Record that B has not been persisted and retain the last credential snapshot known to be durable.
  • Keep healthy access-token requests on B for the current process.
  • On B's next refresh, reread under the transaction lock only to reconcile authority: unchanged durable A fails closed, while a serialized login, logout, or concurrent refresh that replaced A is adopted.
  • Add structured stage and timing logs for provider refresh and persistence failures.

Decisions reviewers should preserve across the stack

  • A successful provider response transfers in-process authority to B before persistence.
  • This stack intentionally does not retry persistence, in the foreground or background. The error is logged and returned so telemetry can tell us whether a retry policy is needed.
  • A healthy B may continue serving requests without polling durable storage on every operation. When B itself needs refresh, unchanged durable A returns the pending-persistence error; a credential that changed under the shared transaction lock is authoritative and may be adopted.
  • Provider timeout is different: no B was received, so the outcome remains unknown and a later serialized retry is allowed.
  • Auto remains lifecycle-local and never hot-switches after construction.
  • The original openai/codex#28647 and its branch are untouched.

Review focus

Review the authority transition around the fallible save, the retained last-known-durable snapshot, and the distinction between “provider response succeeded, persistence failed” and “provider outcome unknown.” No transport ownership changes occur in this layer.

Non-goals

  • Retrying failed persistence.
  • Quarantining credentials after provider timeout.
  • Durable Auto selection or backend migration.

Validation

  • just test -p codex-rmcp-client: 108 passed, 5 skipped.
  • Regression coverage proves an unchanged stale durable snapshot is never reinstalled after a failed save.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7fcc6efe0a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread codex-rs/rmcp-client/src/oauth.rs Outdated
Comment thread codex-rs/rmcp-client/src/oauth/persistor.rs Outdated
Comment thread codex-rs/rmcp-client/src/oauth/persistor.rs
@stevenlee-oai stevenlee-oai force-pushed the dev/stevenlee/mcp-oauth-stack-5-cancellation-safe-refresh branch from b04b694 to f800bd9 Compare June 25, 2026 18:45
@stevenlee-oai stevenlee-oai force-pushed the dev/stevenlee/mcp-oauth-stack-4-shared-stores branch 2 times, most recently from 606864f to 0b3dcef Compare June 25, 2026 21:11
@stevenlee-oai stevenlee-oai requested a review from a team as a code owner June 25, 2026 21:11
@stevenlee-oai

Copy link
Copy Markdown
Contributor Author

[from Codex]: Superseded by the compact five-PR stack. Persistence-failure authority is now folded into #29017 so the refresh transaction is reviewed as one coherent implementation. New order: #29017#29019#29021#29018#30089.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant