refactor: range-based spec version tagging (introducedIn/removedIn)#265
Merged
Conversation
… tagging Replaces per-scenario `specVersions: SpecVersion[]` arrays with `introducedIn` and optional `removedIn` range fields on all three scenario interfaces. Extension scenarios get an orthogonal `extension?: boolean` flag instead of the former `'extension'` tag in the versions list. matchesSpecVersion() is rewritten as a range comparison (introducedIn <= target && (!removedIn || target < removedIn)), with draft sorting after all dated versions. getScenarioSpecVersions() reconstructs the effective version list from the range for backward-compat with tier-check. Closes #256.
commit: |
ca22f78 to
89c1731
Compare
The flat `introducedIn`/`removedIn`/`extension?` shape forced extension
scenarios to set a placeholder `introducedIn` that was never read. Nesting
the version-range fields inside a `source` property typed as a union lets
the compiler enforce introducedIn XOR extensionId — no runtime invariant
test needed.
type ScenarioSource =
| { introducedIn: SpecVersion; removedIn?: SpecVersion }
| { extensionId: ExtensionId };
A class can't `implements` a union type directly, so the union lives inside
a property and `Scenario` stays a plain interface — `class implements
Scenario` keeps working everywhere. `EXTENSION_IDS` is a const array
mirroring the `DATED_SPEC_VERSIONS` pattern.
Also renames the cross-app-access scenario to enterprise-managed-auth to
match the extension's canonical name: file, class, slug, context schema
literal, and the everything-client runner.
EXTENSION_IDS now holds the wire identifiers from the spec's extension docs (client-matrix.mdx, docs/extensions/auth/*.mdx) so extensionId values can be cross-referenced against capabilities.extensions keys directly: io.modelcontextprotocol/oauth-client-credentials io.modelcontextprotocol/enterprise-managed-authorization The previous commit used a shortened 'enterprise-managed-auth' which doesn't match the canonical '-authorization'. Re-renames the scenario file/class/slug and downstream references to align.
89c1731 to
bc1f6bc
Compare
tarekgh
pushed a commit
to mikekistler/conformance
that referenced
this pull request
May 14, 2026
…edIn/removedIn) The introducedIn/removedIn refactor (modelcontextprotocol#265) replaced the specVersions array with a source property on all scenario interfaces. The SEP-2243 scenarios still used the old specVersions field, causing matchesSpecVersion() to receive undefined and crash in spec-version.test.ts. Replace specVersions with source = { introducedIn: DRAFT_PROTOCOL_VERSION } in BaseHttpScenario and the two server ClientScenario classes. Remove the now-unused SpecVersion imports.
pcarleton
added a commit
that referenced
this pull request
May 15, 2026
* Add prepare script for git dependency installs
* Add conformance tests for SEP-2243 HTTP Standardization
* Add traceability file for SEP-2243
* Fix prettier formatting in http-standard-headers.ts
* Improve SEP-2243 conformance test coverage
Close gaps in HTTP header conformance scenarios:
Client standard headers (http-standard-headers.ts):
- Enforce all expected methods in checks, failing any that were not observed
- Track Mcp-Name header per-method separately from Mcp-Method
- Advertise resources and prompts capabilities so clients exercise those endpoints
- Add a prompt entry for prompts/get testing
Client custom headers (http-custom-headers.ts):
- Add Base64 encoding checks for non-ASCII, whitespace, and control-char values
- Add null/omitted parameter test (second tool call with null value)
- Add client-keeps-valid-tool check verifying clients still call valid tools
after filtering out invalid ones
Server header validation (server/http-standard-headers.ts):
- Replace fetch-based sendRawRequest with http.request to preserve exact
header casing on the wire (fetch/Headers lowercases names)
- Compute defaultArgs and defaultHeaders from tool schema so test requests
satisfy all required parameters while varying only the param under test
Traceability (sep-2243.yaml):
- Add 7 new spec-to-check mappings (nonascii, whitespace, controlchar,
keeps-valid-tool, literal-missing-base64-prefix/suffix, no-mirror-unannotated)
- Fix 2 incorrect existing mappings
* Fill SEP-2243 conformance test coverage gaps
Add missing test cases identified by comparing the branch against the
SEP-2243 spec's Conformance Test Cases section:
- Mcp-Method on notifications: add notifications/initialized to
expectedMethods so clients are checked for header on notifications
- Boolean true: add debug=true param to complement verbose=false
- Leading-space-only and trailing-space-only: separate checks for
Base64 encoding when only one edge has whitespace
- Internal spaces only: verify plain ASCII (no Base64) for values
like 'us west 1' that have spaces only in the middle
- CRLF string: add line1\r\nline2 test (complements existing \n test)
- Leading tab: add \tindented test for tab-triggered Base64 encoding
- Server rejects invalid Mcp-Param chars: mark as excluded in YAML
since HTTP itself prevents these characters in headers
Register all new checks in sep-2243.yaml.
* Remove case-insensitive =?base64? prefix test
The spec states header values are case-sensitive (RFC 9110). The
=?base64? prefix is part of the header value, not the header name,
so it should be matched case-sensitively. This was identified as a
bug in the conformance tests per feedback on SEP-2243.
Changes:
- Remove server-accepts-case-insensitive-base64 check from server
validation scenario
- Change base64 prefix regex from case-insensitive (/i) to
case-sensitive in client header validation
- Remove corresponding entry from sep-2243.yaml
* Remove nested x-mcp-header rejection test
The spec constrains x-mcp-header to primitive types but does not
restrict it to top-level properties only. A nested string property
with x-mcp-header is valid per the spec. This was confirmed by
mikekistler in SEP-2243 PR discussion.
Removes invalid_nested_header from the invalid tool definitions
list and its corresponding sep-2243.yaml entry.
* Use numeric comparison for number header values
Compare number header values numerically instead of as strings to
accommodate cross-SDK floating point representation differences
(e.g., '42' vs '42.0'). For integers, exact numeric match is
required. For decimals, a tolerance of 1e-9 is allowed.
See SEP-2243 discussion on number precision.
* client/http-standard-headers: SKIPPED (not FAILURE) for unexercised methods; make getChecks() idempotent
A client that never calls prompts/list isn't violating SEP-2243 — the spec
sentence is 'The client MUST include the standard MCP request headers on each
POST request', and a request that was never sent can't fail it. The Mcp-Method
requirement is already proven by initialize/tools/list etc., so there's nothing
unique to prompts/list. SKIPPED keeps the gap visible without a false red.
Separately: getChecks() was pushing into this.checks on every call. The runner
may call it more than once (e.g. progress + final report); that produced
duplicate rows. Build a fresh array per call instead.
* client/http-standard-headers: de-dup guard in checkMcpNameHeader
checkMcpMethodHeader already has 'if (this.methodHeaderChecks.has(method)) return'.
checkMcpNameHeader was missing the equivalent. The test server advertises two
tools and two resources; a client that calls both produces two
'client-mcp-name-header-tools-call' / '...-resources-read' rows.
* client/http-standard-headers: replace -> /\//g for slug generation
String.replace with a string arg only replaces the first occurrence.
'notifications/initialized' is fine (one slash) but a method like
'notifications/resources/updated' would yield 'notifications-resources/updated'
in the check id. (replaceAll isn't in this project's TS lib target.)
* client/http-custom-headers: base64 regex (.+) -> (.*) so empty string round-trips
The spec doesn't forbid base64-encoding values that don't strictly need it.
A client that always wraps would send '=?base64??=' for empty_val: ''. The
'.+' wouldn't match, so the harness would compare '=?base64??=' === ''
literally and FAIL a compliant client. '(.*)' lets the empty payload decode
to '' and match.
* client/http-custom-headers: emit SUCCESS for no-mirror-unannotated
This check only pushed on FAILURE; a correct client produced no row at all,
so the pass was invisible in reports and couldn't be counted toward coverage.
Same-slug SUCCESS/FAILURE pair is the repo convention.
* client/http-custom-headers: drop redundant optional-present check
checkParamHeader('Verbose', ...) already pushes a FAILURE when the header is
missing. The explicit 'optional-present' block re-tests the same condition
under a second check id, doubling the row.
* server/http-standard-headers: defaultArgs must use real number/boolean, not strings
defaultArgs feeds the JSON-RPC body. With Record<string, string> + '0'/'false',
a server that validates inputSchema rejects on type mismatch — which is also a
400. Every header-rejection check then false-PASSes (server rejected for the
wrong reason), and server-accepts-valid-base64 false-FAILs a compliant server
because it never gets past schema validation.
Also handle 'integer' (JSON Schema distinguishes it from 'number').
* server/http-standard-headers: createAcceptanceCheck must also assert no JSON-RPC error in body
A server can return HTTP 200 with {error: {code: -32001, ...}} in the body.
Status-only acceptance lets that through as a pass.
* use DRAFT_PROTOCOL_VERSION constant instead of 'DRAFT-2026-v1' literal
The literal will rot when the draft cycle rolls over. types.ts already exports
DRAFT_PROTOCOL_VERSION for this.
* sep-2243.yaml: spec-first rewrite + move to src/seps/
Regenerated from the SEP-2243 spec diff (transports.mdx + tools.mdx) rather
than from the scenario implementations: 21 check rows + 4 excluded vs the
previous 46 + 2. Differences:
- one check id per spec sentence (test variants go in 'details', not new ids)
- 'text:' quotes the spec sentence verbatim instead of paraphrasing
- '-reject-status' (MUST 400) split from '-reject-error-code' (SHOULD -32001
for standard headers, MUST for custom)
- rows with no spec backing dropped (whitespace, base64-padding/chars, prefix/
suffix-literal, control-char-name)
- two more SHOULD excludes (log-warning, no-sensitive-params)
- check ids use sep-2243-<slug> convention
- check: key first, excludes grouped at bottom (matches sep-2164)
Moved to src/seps/ to match #272 layout (this branch predates it; will
reconcile cleanly on rebase).
* add negative test for HttpStandardHeadersScenario
Self-contained vitest that POSTs initialize with and without Mcp-Method,
asserts FAILURE/SUCCESS on the pinned check id, and proves getChecks() is
idempotent. No example client/server file needed for this one because the
scenario *is* the server — the test crafts the bad request directly.
This is the negative-test half of AGENTS.md §Testing your scenario; the
everything-client passing-example half is still TODO.
* server/http-standard-headers: severity fixes for whitespace + malformed-base64 tests
These five tests assert behavior the SEP-2243 text doesn't pin down. Kept
because they're useful consistency signals across SDKs, but adjusted so they
don't FAIL spec-compliant servers:
- server-accepts-whitespace-header-value: this IS a MUST, but per RFC 9110
§5.5 (field parsing MUST exclude OWS), not SEP-2243. Kept as FAILURE,
specReference now cites RFC 9110.
- server-rejects-invalid-base64-padding / -chars: downgraded to INFO.
SEP-2243 says only 'MUST decode them accordingly' without picking RFC 4648
strict vs lenient. Node's Buffer.from() accepts unpadded/dirty input and
decodes to a matching value (server accepts); .NET's Convert.FromBase64String
throws (server rejects). Either is currently compliant. WARNING would force
Tier-1 SDKs into non-default strict decoding (#245); INFO records the
behavior without prescribing it, so cross-SDK divergence is visible
without tier impact.
- server-literal-missing-base64-prefix / -suffix: unchanged. The wrapper
syntax is '=?base64?{x}?=' complete; treating partial wrappers as literal
is the natural reading.
Plumbing: createRejectionCheck gains an optional failSeverity param;
testBase64Case gains a 'reject-info' mode. (This is also the hook for the
larger 400-MUST / -32001-SHOULD split that still needs doing.)
* server/http-standard-headers: split rejection check into status (MUST) + error-code (SHOULD/MUST)
SEP-2243 §Server Validation: 'servers MUST return HTTP status 400 ... and
SHOULD include a JSON-RPC error response using ... -32001'. So a server that
returns 400 with -32600 (or no error body) is compliant for standard headers.
The single conflated check FAILed it.
For custom headers (§Server Behavior for Custom Headers) -32001 IS a MUST,
so both halves stay FAILURE there.
createRejectionCheck → createRejectionChecks returning two rows: '<id>' for
the 400 status and '<id>-error-code' for -32001. testCase (standard) sets
errorCodeSeverity=WARNING; testBase64Case/testMissingCustomHeader (custom)
keep FAILURE; the two malformed-base64 INFO probes set both halves to INFO.
* rename check ids to sep-2243-* prefix; kebab-case throughout
Aligns with the sep-2243.yaml traceability ids and the repo convention
(sep-2164-*, sep-2207-*). Variant suffixes are kept (per-method, per-tool)
so per-case visibility isn't lost; the YAML's row id is a prefix of the
emitted ids, which is fine — extra ids not in the YAML are allowed.
Also fixes the mixed hyphen/underscore in reject-invalid-tool ids
(toolName has underscores; replace to hyphens).
* client: extract BaseHttpScenario to client/http-base.ts; both client scenarios extend it
http-standard-headers.ts had its own copy of the start/stop/handleRequest
boilerplate (~100 lines) that http-custom-headers.ts already abstracted as
BaseHttpScenario. Moved the abstract class to a shared file; both scenario
files now import it. HttpStandardHeadersScenario implements handlePost()
instead of handleRequest(); its handleInitialize() uses the base
sendInitialize() with the resources/prompts capability flags it needs.
Net: 465 → 361 lines for http-standard-headers.ts; the third inline copy of
the same HTTP-server scaffold is gone.
* use req.setEncoding('utf8') instead of per-chunk Buffer.toString()
Per-chunk .toString() corrupts a multi-byte UTF-8 char that straddles a TCP
chunk boundary (e.g. 0xC3 | 0xBC decodes to two replacement chars instead of
'ü'). The custom-headers scenario sends '日本語'/'naïve' in the body, so this
is reachable in principle. setEncoding('utf8') makes 'data' emit strings with
boundary handling done by Node's StringDecoder.
Fixed in both places: BaseHttpScenario.handleRequest (request body) and
sendRawRequest (response body).
* fix: migrate SEP-2243 scenarios from specVersions to source (introducedIn/removedIn)
The introducedIn/removedIn refactor (#265) replaced the specVersions array
with a source property on all scenario interfaces. The SEP-2243 scenarios
still used the old specVersions field, causing matchesSpecVersion() to
receive undefined and crash in spec-version.test.ts.
Replace specVersions with source = { introducedIn: DRAFT_PROTOCOL_VERSION }
in BaseHttpScenario and the two server ClientScenario classes. Remove the
now-unused SpecVersion imports.
* server/http-standard-headers: malformed-base64 padding/chars back to FAILURE
Per discussion on #259: the SEP-2243 conformance-test-case table is the
approved source of truth and lists these as reject cases, so they're
FAILURE-level even though the spec body itself only says 'MUST decode them
accordingly'. SDKs whose stdlib base64 is lenient (Node, browsers) will need
to validate before decoding; if that's burdensome we'll revisit.
Removes the now-unused 'reject-info' mode and statusSeverity/INFO plumbing
introduced for these two probes.
---------
Co-authored-by: Tarek Mahmoud Sayed <tarekms@microsoft.com>
Co-authored-by: Tarek Mahmoud Sayed <tarekms@ntdev.microsoft.com>
Co-authored-by: Tarek Mahmoud Sayed <10833894+tarekgh@users.noreply.github.com>
Co-authored-by: Paul Carleton <paulc@anthropic.com>
This was referenced May 15, 2026
pcarleton
added a commit
that referenced
this pull request
May 18, 2026
Replaces `specVersions: ['draft']` with `source: { introducedIn: DRAFT_PROTOCOL_VERSION }`
in the 5 iss-parameter scenarios.
This commit typechecks once the stack is rebased onto main >= #265 (the
ScenarioSource migration). Adding it now so the rebase is mechanical.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #256. Follows up on #255.
Problem
Scenarios were tagged with an explicit list of every spec version they apply to:
Every new spec release means touching every carried-forward scenario, and there's no way to express removal — a SEP that tightens or removes a requirement can't mark an old scenario as "no longer valid in draft". Extension scenarios were tagged
['extension'], which doesn't fit a version list at all.Design
Replace the flat
specVersionslist with a nestedsourceunion:A scenario either sits on the spec timeline (
introducedIn/ optionalremovedIn) or names a protocol extension by its SEP-2133 identifier. The compiler enforces the XOR; no runtime invariant test needed. The union lives inside a property soScenariostays a plain interface andclass implements Scenariokeeps working — TypeScript won't let a classimplementsa union directly.matchesSpecVersionbecomes a range check gated on the union:getScenarioSpecVersions()reconstructs the legacyScenarioSpecTag[]for tier-check;ALL_SPEC_VERSIONS,resolveSpecVersion, and all--spec-versionCLI semantics are unchanged.Migration
specVersionssource['2025-03-26', '2025-06-18', '2025-11-25']{ introducedIn: '2025-03-26' }['2025-06-18', '2025-11-25']{ introducedIn: '2025-06-18' }['2025-11-25']{ introducedIn: '2025-11-25' }['2025-03-26']{ introducedIn: '2025-03-26', removedIn: '2025-06-18' }[DRAFT_PROTOCOL_VERSION]{ introducedIn: DRAFT_PROTOCOL_VERSION }['extension']{ extensionId: 'io.modelcontextprotocol/...' }cross-app-access→enterprise-managed-authorizationrenameAligns with the canonical extension name from
docs/extensions/client-matrix.mdx. Renamed: file, classEnterpriseManagedAuthorizationScenario, slugauth/enterprise-managed-authorization,context.tsschema literal, and theeverything-client.tsrunner.Testing
(118 → 116: the two "every Scenario has introducedIn" runtime tests are now compiler-enforced and removed.) Build, type-check, and lint clean.
node dist/index.js listshowsauth/enterprise-managed-authorization [extension].