Skip to content

refactor: range-based spec version tagging (introducedIn/removedIn)#265

Merged
pcarleton merged 4 commits into
mainfrom
paulc/introducedin-removedin
May 14, 2026
Merged

refactor: range-based spec version tagging (introducedIn/removedIn)#265
pcarleton merged 4 commits into
mainfrom
paulc/introducedin-removedin

Conversation

@pcarleton
Copy link
Copy Markdown
Member

@pcarleton pcarleton commented May 8, 2026

Closes #256. Follows up on #255.

Problem

Scenarios were tagged with an explicit list of every spec version they apply to:

specVersions = ['2025-06-18', '2025-11-25']

Every new spec release means touching every carried-forward scenario, and there's no way to express removal — a SEP that tightens or removes a requirement can't mark an old scenario as "no longer valid in draft". Extension scenarios were tagged ['extension'], which doesn't fit a version list at all.

Design

Replace the flat specVersions list with a nested source union:

export const EXTENSION_IDS = [
  'io.modelcontextprotocol/oauth-client-credentials',
  'io.modelcontextprotocol/enterprise-managed-authorization'
] as const;
export type ExtensionId = (typeof EXTENSION_IDS)[number];

export type ScenarioSource =
  | { introducedIn: DatedSpecVersion | typeof DRAFT_PROTOCOL_VERSION;
      removedIn?: DatedSpecVersion | typeof DRAFT_PROTOCOL_VERSION }
  | { extensionId: ExtensionId };

export interface Scenario {
  ...
  source: ScenarioSource;
}

A scenario either sits on the spec timeline (introducedIn / optional removedIn) or names a protocol extension by its SEP-2133 identifier. The compiler enforces the XOR; no runtime invariant test needed. The union lives inside a property so Scenario stays a plain interface and class implements Scenario keeps working — TypeScript won't let a class implements a union directly.

matchesSpecVersion becomes a range check gated on the union:

if ('extensionId' in source) return false;
return versionIndex(source.introducedIn) <= versionIndex(version)
    && (source.removedIn === undefined || versionIndex(version) < versionIndex(source.removedIn));

getScenarioSpecVersions() reconstructs the legacy ScenarioSpecTag[] for tier-check; ALL_SPEC_VERSIONS, resolveSpecVersion, and all --spec-version CLI semantics are unchanged.

Migration

Old specVersions New source
['2025-03-26', '2025-06-18', '2025-11-25'] { introducedIn: '2025-03-26' }
['2025-06-18', '2025-11-25'] { introducedIn: '2025-06-18' }
['2025-11-25'] { introducedIn: '2025-11-25' }
['2025-03-26'] { introducedIn: '2025-03-26', removedIn: '2025-06-18' }
[DRAFT_PROTOCOL_VERSION] { introducedIn: DRAFT_PROTOCOL_VERSION }
['extension'] { extensionId: 'io.modelcontextprotocol/...' }

cross-app-accessenterprise-managed-authorization rename

Aligns with the canonical extension name from docs/extensions/client-matrix.mdx. Renamed: file, class EnterpriseManagedAuthorizationScenario, slug auth/enterprise-managed-authorization, context.ts schema literal, and the everything-client.ts runner.

Testing

$ npx vitest run
Test Files  9 passed (9)
     Tests  116 passed (116)

(118 → 116: the two "every Scenario has introducedIn" runtime tests are now compiler-enforced and removed.) Build, type-check, and lint clean. node dist/index.js list shows auth/enterprise-managed-authorization [extension].

… tagging

Replaces per-scenario `specVersions: SpecVersion[]` arrays with `introducedIn`
and optional `removedIn` range fields on all three scenario interfaces. Extension
scenarios get an orthogonal `extension?: boolean` flag instead of the former
`'extension'` tag in the versions list.

matchesSpecVersion() is rewritten as a range comparison
(introducedIn <= target && (!removedIn || target < removedIn)), with draft
sorting after all dated versions. getScenarioSpecVersions() reconstructs the
effective version list from the range for backward-compat with tier-check.
Closes #256.
@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented May 8, 2026

Open in StackBlitz

npx https://pkg.pr.new/@modelcontextprotocol/conformance@265

commit: f1e683a

@pcarleton pcarleton force-pushed the paulc/introducedin-removedin branch from ca22f78 to 89c1731 Compare May 14, 2026 13:15
pcarleton added 2 commits May 14, 2026 14:20
The flat `introducedIn`/`removedIn`/`extension?` shape forced extension
scenarios to set a placeholder `introducedIn` that was never read. Nesting
the version-range fields inside a `source` property typed as a union lets
the compiler enforce introducedIn XOR extensionId — no runtime invariant
test needed.

  type ScenarioSource =
    | { introducedIn: SpecVersion; removedIn?: SpecVersion }
    | { extensionId: ExtensionId };

A class can't `implements` a union type directly, so the union lives inside
a property and `Scenario` stays a plain interface — `class implements
Scenario` keeps working everywhere. `EXTENSION_IDS` is a const array
mirroring the `DATED_SPEC_VERSIONS` pattern.

Also renames the cross-app-access scenario to enterprise-managed-auth to
match the extension's canonical name: file, class, slug, context schema
literal, and the everything-client runner.
EXTENSION_IDS now holds the wire identifiers from the spec's extension docs
(client-matrix.mdx, docs/extensions/auth/*.mdx) so extensionId values can be
cross-referenced against capabilities.extensions keys directly:

  io.modelcontextprotocol/oauth-client-credentials
  io.modelcontextprotocol/enterprise-managed-authorization

The previous commit used a shortened 'enterprise-managed-auth' which doesn't
match the canonical '-authorization'. Re-renames the scenario file/class/slug
and downstream references to align.
@pcarleton pcarleton force-pushed the paulc/introducedin-removedin branch from 89c1731 to bc1f6bc Compare May 14, 2026 13:24
@pcarleton pcarleton merged commit acc70de into main May 14, 2026
8 checks passed
@pcarleton pcarleton deleted the paulc/introducedin-removedin branch May 14, 2026 15:12
tarekgh pushed a commit to mikekistler/conformance that referenced this pull request May 14, 2026
…edIn/removedIn)

The introducedIn/removedIn refactor (modelcontextprotocol#265) replaced the specVersions array
with a source property on all scenario interfaces. The SEP-2243 scenarios
still used the old specVersions field, causing matchesSpecVersion() to
receive undefined and crash in spec-version.test.ts.

Replace specVersions with source = { introducedIn: DRAFT_PROTOCOL_VERSION }
in BaseHttpScenario and the two server ClientScenario classes. Remove the
now-unused SpecVersion imports.
pcarleton added a commit that referenced this pull request May 15, 2026
* Add prepare script for git dependency installs

* Add conformance tests for SEP-2243 HTTP Standardization

* Add traceability file for SEP-2243

* Fix prettier formatting in http-standard-headers.ts

* Improve SEP-2243 conformance test coverage

Close gaps in HTTP header conformance scenarios:

Client standard headers (http-standard-headers.ts):
- Enforce all expected methods in checks, failing any that were not observed
- Track Mcp-Name header per-method separately from Mcp-Method
- Advertise resources and prompts capabilities so clients exercise those endpoints
- Add a prompt entry for prompts/get testing

Client custom headers (http-custom-headers.ts):
- Add Base64 encoding checks for non-ASCII, whitespace, and control-char values
- Add null/omitted parameter test (second tool call with null value)
- Add client-keeps-valid-tool check verifying clients still call valid tools
  after filtering out invalid ones

Server header validation (server/http-standard-headers.ts):
- Replace fetch-based sendRawRequest with http.request to preserve exact
  header casing on the wire (fetch/Headers lowercases names)
- Compute defaultArgs and defaultHeaders from tool schema so test requests
  satisfy all required parameters while varying only the param under test

Traceability (sep-2243.yaml):
- Add 7 new spec-to-check mappings (nonascii, whitespace, controlchar,
  keeps-valid-tool, literal-missing-base64-prefix/suffix, no-mirror-unannotated)
- Fix 2 incorrect existing mappings

* Fill SEP-2243 conformance test coverage gaps

Add missing test cases identified by comparing the branch against the
SEP-2243 spec's Conformance Test Cases section:

- Mcp-Method on notifications: add notifications/initialized to
  expectedMethods so clients are checked for header on notifications
- Boolean true: add debug=true param to complement verbose=false
- Leading-space-only and trailing-space-only: separate checks for
  Base64 encoding when only one edge has whitespace
- Internal spaces only: verify plain ASCII (no Base64) for values
  like 'us west 1' that have spaces only in the middle
- CRLF string: add line1\r\nline2 test (complements existing \n test)
- Leading tab: add \tindented test for tab-triggered Base64 encoding
- Server rejects invalid Mcp-Param chars: mark as excluded in YAML
  since HTTP itself prevents these characters in headers

Register all new checks in sep-2243.yaml.

* Remove case-insensitive =?base64? prefix test

The spec states header values are case-sensitive (RFC 9110). The
=?base64? prefix is part of the header value, not the header name,
so it should be matched case-sensitively. This was identified as a
bug in the conformance tests per feedback on SEP-2243.

Changes:
- Remove server-accepts-case-insensitive-base64 check from server
  validation scenario
- Change base64 prefix regex from case-insensitive (/i) to
  case-sensitive in client header validation
- Remove corresponding entry from sep-2243.yaml

* Remove nested x-mcp-header rejection test

The spec constrains x-mcp-header to primitive types but does not
restrict it to top-level properties only. A nested string property
with x-mcp-header is valid per the spec. This was confirmed by
mikekistler in SEP-2243 PR discussion.

Removes invalid_nested_header from the invalid tool definitions
list and its corresponding sep-2243.yaml entry.

* Use numeric comparison for number header values

Compare number header values numerically instead of as strings to
accommodate cross-SDK floating point representation differences
(e.g., '42' vs '42.0'). For integers, exact numeric match is
required. For decimals, a tolerance of 1e-9 is allowed.

See SEP-2243 discussion on number precision.

* client/http-standard-headers: SKIPPED (not FAILURE) for unexercised methods; make getChecks() idempotent

A client that never calls prompts/list isn't violating SEP-2243 — the spec
sentence is 'The client MUST include the standard MCP request headers on each
POST request', and a request that was never sent can't fail it. The Mcp-Method
requirement is already proven by initialize/tools/list etc., so there's nothing
unique to prompts/list. SKIPPED keeps the gap visible without a false red.

Separately: getChecks() was pushing into this.checks on every call. The runner
may call it more than once (e.g. progress + final report); that produced
duplicate rows. Build a fresh array per call instead.

* client/http-standard-headers: de-dup guard in checkMcpNameHeader

checkMcpMethodHeader already has 'if (this.methodHeaderChecks.has(method)) return'.
checkMcpNameHeader was missing the equivalent. The test server advertises two
tools and two resources; a client that calls both produces two
'client-mcp-name-header-tools-call' / '...-resources-read' rows.

* client/http-standard-headers: replace -> /\//g for slug generation

String.replace with a string arg only replaces the first occurrence.
'notifications/initialized' is fine (one slash) but a method like
'notifications/resources/updated' would yield 'notifications-resources/updated'
in the check id. (replaceAll isn't in this project's TS lib target.)

* client/http-custom-headers: base64 regex (.+) -> (.*) so empty string round-trips

The spec doesn't forbid base64-encoding values that don't strictly need it.
A client that always wraps would send '=?base64??=' for empty_val: ''. The
'.+' wouldn't match, so the harness would compare '=?base64??=' === ''
literally and FAIL a compliant client. '(.*)' lets the empty payload decode
to '' and match.

* client/http-custom-headers: emit SUCCESS for no-mirror-unannotated

This check only pushed on FAILURE; a correct client produced no row at all,
so the pass was invisible in reports and couldn't be counted toward coverage.
Same-slug SUCCESS/FAILURE pair is the repo convention.

* client/http-custom-headers: drop redundant optional-present check

checkParamHeader('Verbose', ...) already pushes a FAILURE when the header is
missing. The explicit 'optional-present' block re-tests the same condition
under a second check id, doubling the row.

* server/http-standard-headers: defaultArgs must use real number/boolean, not strings

defaultArgs feeds the JSON-RPC body. With Record<string, string> + '0'/'false',
a server that validates inputSchema rejects on type mismatch — which is also a
400. Every header-rejection check then false-PASSes (server rejected for the
wrong reason), and server-accepts-valid-base64 false-FAILs a compliant server
because it never gets past schema validation.

Also handle 'integer' (JSON Schema distinguishes it from 'number').

* server/http-standard-headers: createAcceptanceCheck must also assert no JSON-RPC error in body

A server can return HTTP 200 with {error: {code: -32001, ...}} in the body.
Status-only acceptance lets that through as a pass.

* use DRAFT_PROTOCOL_VERSION constant instead of 'DRAFT-2026-v1' literal

The literal will rot when the draft cycle rolls over. types.ts already exports
DRAFT_PROTOCOL_VERSION for this.

* sep-2243.yaml: spec-first rewrite + move to src/seps/

Regenerated from the SEP-2243 spec diff (transports.mdx + tools.mdx) rather
than from the scenario implementations: 21 check rows + 4 excluded vs the
previous 46 + 2. Differences:

- one check id per spec sentence (test variants go in 'details', not new ids)
- 'text:' quotes the spec sentence verbatim instead of paraphrasing
- '-reject-status' (MUST 400) split from '-reject-error-code' (SHOULD -32001
  for standard headers, MUST for custom)
- rows with no spec backing dropped (whitespace, base64-padding/chars, prefix/
  suffix-literal, control-char-name)
- two more SHOULD excludes (log-warning, no-sensitive-params)
- check ids use sep-2243-<slug> convention
- check: key first, excludes grouped at bottom (matches sep-2164)

Moved to src/seps/ to match #272 layout (this branch predates it; will
reconcile cleanly on rebase).

* add negative test for HttpStandardHeadersScenario

Self-contained vitest that POSTs initialize with and without Mcp-Method,
asserts FAILURE/SUCCESS on the pinned check id, and proves getChecks() is
idempotent. No example client/server file needed for this one because the
scenario *is* the server — the test crafts the bad request directly.

This is the negative-test half of AGENTS.md §Testing your scenario; the
everything-client passing-example half is still TODO.

* server/http-standard-headers: severity fixes for whitespace + malformed-base64 tests

These five tests assert behavior the SEP-2243 text doesn't pin down. Kept
because they're useful consistency signals across SDKs, but adjusted so they
don't FAIL spec-compliant servers:

- server-accepts-whitespace-header-value: this IS a MUST, but per RFC 9110
  §5.5 (field parsing MUST exclude OWS), not SEP-2243. Kept as FAILURE,
  specReference now cites RFC 9110.

- server-rejects-invalid-base64-padding / -chars: downgraded to INFO.
  SEP-2243 says only 'MUST decode them accordingly' without picking RFC 4648
  strict vs lenient. Node's Buffer.from() accepts unpadded/dirty input and
  decodes to a matching value (server accepts); .NET's Convert.FromBase64String
  throws (server rejects). Either is currently compliant. WARNING would force
  Tier-1 SDKs into non-default strict decoding (#245); INFO records the
  behavior without prescribing it, so cross-SDK divergence is visible
  without tier impact.

- server-literal-missing-base64-prefix / -suffix: unchanged. The wrapper
  syntax is '=?base64?{x}?=' complete; treating partial wrappers as literal
  is the natural reading.

Plumbing: createRejectionCheck gains an optional failSeverity param;
testBase64Case gains a 'reject-info' mode. (This is also the hook for the
larger 400-MUST / -32001-SHOULD split that still needs doing.)

* server/http-standard-headers: split rejection check into status (MUST) + error-code (SHOULD/MUST)

SEP-2243 §Server Validation: 'servers MUST return HTTP status 400 ... and
SHOULD include a JSON-RPC error response using ... -32001'. So a server that
returns 400 with -32600 (or no error body) is compliant for standard headers.
The single conflated check FAILed it.

For custom headers (§Server Behavior for Custom Headers) -32001 IS a MUST,
so both halves stay FAILURE there.

createRejectionCheck → createRejectionChecks returning two rows: '<id>' for
the 400 status and '<id>-error-code' for -32001. testCase (standard) sets
errorCodeSeverity=WARNING; testBase64Case/testMissingCustomHeader (custom)
keep FAILURE; the two malformed-base64 INFO probes set both halves to INFO.

* rename check ids to sep-2243-* prefix; kebab-case throughout

Aligns with the sep-2243.yaml traceability ids and the repo convention
(sep-2164-*, sep-2207-*). Variant suffixes are kept (per-method, per-tool)
so per-case visibility isn't lost; the YAML's row id is a prefix of the
emitted ids, which is fine — extra ids not in the YAML are allowed.

Also fixes the mixed hyphen/underscore in reject-invalid-tool ids
(toolName has underscores; replace to hyphens).

* client: extract BaseHttpScenario to client/http-base.ts; both client scenarios extend it

http-standard-headers.ts had its own copy of the start/stop/handleRequest
boilerplate (~100 lines) that http-custom-headers.ts already abstracted as
BaseHttpScenario. Moved the abstract class to a shared file; both scenario
files now import it. HttpStandardHeadersScenario implements handlePost()
instead of handleRequest(); its handleInitialize() uses the base
sendInitialize() with the resources/prompts capability flags it needs.

Net: 465 → 361 lines for http-standard-headers.ts; the third inline copy of
the same HTTP-server scaffold is gone.

* use req.setEncoding('utf8') instead of per-chunk Buffer.toString()

Per-chunk .toString() corrupts a multi-byte UTF-8 char that straddles a TCP
chunk boundary (e.g. 0xC3 | 0xBC decodes to two replacement chars instead of
'ü'). The custom-headers scenario sends '日本語'/'naïve' in the body, so this
is reachable in principle. setEncoding('utf8') makes 'data' emit strings with
boundary handling done by Node's StringDecoder.

Fixed in both places: BaseHttpScenario.handleRequest (request body) and
sendRawRequest (response body).

* fix: migrate SEP-2243 scenarios from specVersions to source (introducedIn/removedIn)

The introducedIn/removedIn refactor (#265) replaced the specVersions array
with a source property on all scenario interfaces. The SEP-2243 scenarios
still used the old specVersions field, causing matchesSpecVersion() to
receive undefined and crash in spec-version.test.ts.

Replace specVersions with source = { introducedIn: DRAFT_PROTOCOL_VERSION }
in BaseHttpScenario and the two server ClientScenario classes. Remove the
now-unused SpecVersion imports.

* server/http-standard-headers: malformed-base64 padding/chars back to FAILURE

Per discussion on #259: the SEP-2243 conformance-test-case table is the
approved source of truth and lists these as reject cases, so they're
FAILURE-level even though the spec body itself only says 'MUST decode them
accordingly'. SDKs whose stdlib base64 is lenient (Node, browsers) will need
to validate before decoding; if that's burdensome we'll revisit.

Removes the now-unused 'reject-info' mode and statusSeverity/INFO plumbing
introduced for these two probes.

---------

Co-authored-by: Tarek Mahmoud Sayed <tarekms@microsoft.com>
Co-authored-by: Tarek Mahmoud Sayed <tarekms@ntdev.microsoft.com>
Co-authored-by: Tarek Mahmoud Sayed <10833894+tarekgh@users.noreply.github.com>
Co-authored-by: Paul Carleton <paulc@anthropic.com>
@panyam panyam mentioned this pull request May 18, 2026
9 tasks
pcarleton added a commit that referenced this pull request May 18, 2026
Replaces `specVersions: ['draft']` with `source: { introducedIn: DRAFT_PROTOCOL_VERSION }`
in the 5 iss-parameter scenarios.

This commit typechecks once the stack is rebased onto main >= #265 (the
ScenarioSource migration). Adding it now so the rebase is mechanical.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Replace specVersions list with introducedIn/removedIn range tagging

2 participants