Add receiver-side resource limits for untrusted peers (#184)#185
Add receiver-side resource limits for untrusted peers (#184)#185ndisidore wants to merge 1 commit into
Conversation
🦋 Changeset detectedLatest commit: ebee288 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
All contributors have signed the CLA ✍️ ✅ |
commit: |
|
I have read the CLA Document and I hereby sign the CLA |
| throw new TypeError( | ||
| `Deserialized bigint exceeds maximum length of ${maxBigIntDigits} digits.`); | ||
| } | ||
| // Reject anything BigInt() would parse expensively or unexpectedly before handing it |
There was a problem hiding this comment.
To be discussed:
I think this was a bug/unintentional behavior before.
protocol.md:181: "A bigint value, represented as a decimal string."
It is technically possible now to handcraft ["bigint","0xff"] which unconditionally calls BigInt(value[1]) meaning this gets parsed as 255n.
But this means there is a disconnect in the protocol writeup and the actual implementation
There was a problem hiding this comment.
Hmm, I'm tempted to allow, and maybe even require, the hex format, because it'll obviously be much more efficient to parse. Parsing hex is strictly O(n) so you might even argue we don't need a limit then (though we might as well keep it anyway).
It's nice that switching to generating hex is backwards-compatible...
There was a problem hiding this comment.
Done. Changed to now always emit hex, but it can still consume binary (to allow backwards compat)
There was a problem hiding this comment.
am I understanding correctly? in the case of a newer server talking to an older client:
- before this PR:
serialize(-255n)produces["bigint","-255"]. Old receivers callBigInt("-255")--> works fine. - after this PR:
serialize(-255n)produces["bigint","-0xff"]. Old receivers callBigInt("-0xff")--> throws SyntaxError, aborting the session.
so technically this is a major break? (well, semver 0.x bump, in our pre-1.0 case).
| // Bound the up-front allocation of a single message before parsing it. This is a local, | ||
| // receiver-side guard against an untrusted peer sending an enormous message; a throw here | ||
| // propagates out of readLoop and aborts the session. | ||
| if (msgText.length > this.limits.maxMessageSize) { |
There was a problem hiding this comment.
Feels like it would benefit the underlying transport to know about message size limits so that it could avoid even reading more bytes than the limit into memory.
Could be passed into receive() but some transports would probably want to know in the constructor?
Maybe that just means it is up to the app to pass in when they are constructing such a transport.
I wonder if any WebSocket implementations support specifying limits. You'd think they would since otherwise sending enormous WebSocket frames seems like an easy way to exhaust anyone's memory...
There was a problem hiding this comment.
So "plumb the limit into transports" decomposes into three quite different problems:
- Plumbing: how does the limit even reach a transport? Transports are frequently built by the app and handed to new
RpcSession(transport, …), sooptions.limitscan't reach a pre-built transport. Only thenewWebSocketRpcSession/newHttpBatchRpcResponsehelpers (which construct the transport internally) could thread it. the raw constructor path can't. That's an interface-design decision with back-compat implications - The socket-based transports (the untrusted-facing ones) can't be enforced by capnweb at all: the platform reads and buffers the frame before capnweb's listener runs. The only real lever is the underlying socket's native limit: Node ws's maxPayload, Bun's maxPayloadLength, or the runtime's own cap (Cloudflare Workers, for instance, already imposes a built-in WebSocket message-size limit). All of those are app-configured when the socket is created (you mention landing on this as well)
- HTTP batch is the one place capnweb genuinely controls the read, so it's the only transport where the real value add is, and even there it means replacing the clean
.text()with a manualReadableStreamread that counts bytes and aborts past the cap (plus disambiguating batch size vs per-message size, since a batch is newline-joined messages in one body).
My read is this ultimatley boils down to touching the RpcTransport interface, all 5 transports, the session-helper entry points, and docs/tests, all while still being unable to protect the most exposed case (a WebSocket server) from capnweb's own code, because that protection physically lives in the socket the app constructs.
I'm going to suggest we leave this to a followup pr
There was a problem hiding this comment.
Fair summary. I was thinking something along the lines of passing the limits into the transport constructors. Of course, if the app constructs its own transport, it is its own job to pass limits to that.
But I suppose as you point out the standard WebSocket API doesn't actually give us any ability to configure max incoming message size... we'd have to use non-standard settings if anything... bleh. Maybe leave it up to the app.
| throw new TypeError( | ||
| `Deserialized bigint exceeds maximum length of ${maxBigIntDigits} digits.`); | ||
| } | ||
| // Reject anything BigInt() would parse expensively or unexpectedly before handing it |
There was a problem hiding this comment.
Hmm, I'm tempted to allow, and maybe even require, the hex format, because it'll obviously be much more efficient to parse. Parsing hex is strictly O(n) so you might even argue we don't need a limit then (though we might as well keep it anyway).
It's nice that switching to generating hex is backwards-compatible...
Deserializing messages from an untrusted peer could exhaust CPU, memory, or the stack: - A long [bigint,...] digit string fed straight to BigInt() blocks the event loop, because BigInt()'s decimal parse is superlinear. - A deeply nested message could overflow the stack during evaluation. - An arbitrarily large message was JSON-parsed with no up-front size bound. Add three local, receiver-side guards in the deserialization path: - Cap bigint digit length and reject non-numeric strings before BigInt(). - Bound nesting depth in Evaluator.evaluateImpl, mirroring the existing send-side depth limit in Devaluator. - Reject oversized messages in the session read loop before JSON.parse. Limits have safe exported defaults (DEFAULT_LIMITS) and are overridable per session via a new RpcLimits-typed 'limits' field on RpcSessionOptions, surfaced to the Evaluator through the Importer interface so the standalone deserialize() path stays protected by the defaults. Limits are purely local (the protocol has no negotiation step); exceeding one throws, which aborts the session via the existing abort path.
a6ac783 to
ebee288
Compare
| "capnweb": minor | ||
| --- | ||
|
|
||
| Add receiver-side resource limits to guard against untrusted-peer resource exhaustion (#184). Deserialization now caps the length of `bigint` values, accepts both emitted hex strings and legacy decimal strings, bounds message nesting depth across nested call arguments, and rejects oversized incoming messages before parsing. The limits are local, receiver-side decisions with safe defaults (`DEFAULT_LIMITS`), and can be overridden per session via the new `limits` field on `RpcSessionOptions`. Exceeding a limit aborts the session, reusing the existing abort path. |
There was a problem hiding this comment.
Hmm, I think the changeset description is really meant to be a one-liner, it shows up in a bullet list in the release notes. I actually don't know how multiple paragraphs will render.
| function nestedCallArgs(depth: number): string { | ||
| let s = "1"; | ||
| for (let i = 0; i < depth; i++) { | ||
| s = `["pipeline",0,["echo"],[${s}]]`; | ||
| } | ||
| return s; | ||
| } |
There was a problem hiding this comment.
| function nestedCallArgs(depth: number): string { | |
| let s = "1"; | |
| for (let i = 0; i < depth; i++) { | |
| s = `["pipeline",0,["echo"],[${s}]]`; | |
| } | |
| return s; | |
| } | |
| function nestedCallArgs(depth: number): string { | |
| return `["pipeline",0,["echo"],[`.repeat(depth) + "1" + "]]".repeat(depth); | |
| } |
same as the above - I think we can avoid the loop assignment with this
| function nestedEscapedArrays(depth: number): string { | ||
| let s = "1"; | ||
| for (let i = 0; i < depth; i++) { | ||
| s = `[[${s}]]`; | ||
| } | ||
| return s; | ||
| } |
There was a problem hiding this comment.
| function nestedEscapedArrays(depth: number): string { | |
| let s = "1"; | |
| for (let i = 0; i < depth; i++) { | |
| s = `[[${s}]]`; | |
| } | |
| return s; | |
| } | |
| function nestedEscapedArrays(depth: number): string { | |
| return `${"[[".repeat(depth)}1${"]]".repeat(depth)}`; | |
| } |
maybe I'm misunderstanding the algorithm, but I'm pretty sure it can be this? that way we can avoid needing to do the assignment in a loop. not like it's a big deal or anything
| // buffered the complete string; true pre-read enforcement belongs in the transport/socket. | ||
| // This backstop still prevents oversized messages from reaching JSON.parse and downstream | ||
| // deserialization work. A throw here propagates out of readLoop and aborts the session. | ||
| if (msgText.length > this.limits.maxMessageSize) { |
There was a problem hiding this comment.
do we need to be thinking in terms of bytes not characters? so like
// 50 emoji = 100 UTF-16 code units (.length), but 200 bytes in UTF-8
// PASSES the check despite being 2x the "limit" in actual memory
const msg = "😀".repeat(50);
console.log({
chars: msg.length, // 100
bytes: new TextEncoder().encode(msg).byteLength, // 200
});
// Conversely, 100 ASCII chars = 100 code units AND 100 bytes
const msg2 = "a".repeat(100);
console.log({
chars: msg2.length, // 100
bytes: new TextEncoder().encode(msg2).byteLength, // 100
})| let aborted = await pushFrame(overLimit, { maxBigIntDigits: 4 }).waitForAbort(); | ||
| expect(aborted).toMatch(/bigint exceeds maximum length/); | ||
|
|
||
| // The same 5-digit bigint is well under the default limit, so it does NOT abort by default. |
There was a problem hiding this comment.
but it's not a 5-digit, it's a 3-digit hex number with a 0x prefix (not part of the number). In general, I feel like it'd be pretty reasonable to not consider the 0x in counting digits.
There was a problem hiding this comment.
and in reading the section about handling negative makes me wonder if the negative sign would also wrongly be counted as a digit, too.
| return BigInt(digits); | ||
| } else if (/^-?0x[0-9a-fA-F]+$/.test(digits)) { | ||
| let isNegative = digits.startsWith("-"); | ||
| let magnitude = BigInt(isNegative ? digits.slice(1) : digits); |
There was a problem hiding this comment.
wow, I didn't know that BigInt("-0xf") throws. huh!
Summary
Deserializing untrusted input could exhaust a peer's resources. Most concretely (#184), a
["bigint", "<digits>"]wire value was handed straight toBigInt()with no length check, andBigInt()decimal parsing is superlinear — so a multi-megabyte digit string blocks the event loop. While fixing that, the receive path is also bounded against the related single-message exhaustion vectors the issue calls out ("review that we're properly limiting message sizes overall").The guards are local, receiver-side decisions that fit capnweb's existing design: no new dependencies, no protocol changes, and reuse of the existing abort path on violation.
What this changes
BigInt().evaluateImplnow tracks recursion depth and rejects messages nested beyond the limit (mirroring the existing send-side depth cap), so a tiny deeply-nested message can't overflow the stack.JSON.parse, bounding the up-front allocation.Configuration
Limits are always applied from the exported
DEFAULT_LIMITS, and can be overridden per session via a new optionallimitsfield onRpcSessionOptions:Defaults:
maxBigIntDigits: 16384— 2^14 digits ≈ ~54,000 bits, far beyond any practical cryptographic integer (an RSA-16384 modulus is ~4,933 decimal digits), so no legitimate value is rejected — yet orders of magnitude below the millions of digits needed forBigInt()'s superlinear parse to block meaningfully.maxDepth: 64— matches the existing send-side depth limit, so capnweb never rejects a message it would itself send.maxMessageSize: 32 MiB— generous for legitimate batched and base64/blob payloads while still bounding a single allocation.Because the protocol has no negotiation step and a limit is a purely local receiver-side decision, the defaults are deliberately generous: a sender cannot discover the receiver's limit, so the defaults must never trip legitimate traffic. They are tunable per session for endpoints with unusual needs.
Behavior on violation
Exceeding any limit throws, which flows through the existing readLoop →
abort()path: the session sends an["abort", ...]frame and tears down. The thrown errors name the limit and its value, so a cooperative peer sees a descriptive reason on the aborted session — riding the abort frame that is already sent, with no protocol change.Alternatives considered
deserialize()path, but leaves no escape hatch for applications that legitimately exchange very large payloads. Kept the hardcoded defaults as the floor, but made them overridable.getLimits()RPC method — purely a cooperative-peer ergonomic that needs no library change (an application can already expose such a method); it is not a substitute for local enforcement, so it is left to applications that want it.Deferred (potential follow-ups)
These are larger resource-exhaustion concerns from the issue discussion that need rate/quota or backpressure design rather than a per-message guard, and are intentionally out of scope here:
push/stream/pipemessages (the "many messages → OOM" case) — needs per-session quotas and backpressure.remapinstruction/capture fan-out producing multiplicative work.Note: nested call-arguments are deserialized as a fresh payload with a fresh depth budget, so a message that nests deeply through call arguments can still recurse beyond
maxDepth. This fails safe — it is bounded bymaxMessageSizeand produces a clean session abort (a caughtRangeError), not a process crash (verified) — and belongs to the same follow-up bucket.Testing
__tests__/limits.test.ts(14 tests): just-under / over boundaries for each limit, non-numeric bigint rejection, per-session override enforcement, the standalonedeserialize()protected by defaults, and backwards-compatibility — sessions constructed with no options, and with empty options, round-trip normal bigints / nesting / calls unchanged.