Version Packages#1822
Merged
Merged
Conversation
1f7a6e1 to
3dedab6
Compare
3dedab6 to
bcc9f1a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR was opened by the Changesets release GitHub action. When you're ready to do a release, you can merge this and the packages will be published to npm automatically. If you're not ready to do a release yet, that's fine, whenever you add more changesets to main, this PR will be updated.
Releases
agents@0.17.1
Patch Changes
#1826
1bbd9bcThanks @threepointone! - Add a tight, OOM-specific retry budget to chat recovery so a memory-limit crash loop seals fast and attributably (#1825).When a recovery turn hits a Durable Object memory-limit reset (the isolate exceeded its 128 MB limit), recovery now classifies it as a distinct, deterministic failure rather than a deploy-style transient. A memory reset re-OOMs on re-run (the turn's working set, not the platform, is the cause), so it must NOT be deferred and retried forever like a code-update/connection-lost transient. Each such crash bumps a durable per-incident
oomAttemptscounter; recovery retries a small number of times (newchatRecovery.maxOomRetries, default3) — in case the OOM was a transient spike — then seals withreason="out_of_memory". This is far tighter than the genericmaxRecoveryWorkbackstop because an OOM is attributable and each re-run re-runs the model.This complements the finite
maxRecoveryWorkdefault: the OOM budget is the fast path for memory resets that surface as catchable errors thrown from recovery bookkeeping (e.g. storage/SQL rejections after the reset), whilemaxRecoveryWorkremains a backstop for the hard-kill case where no in-isolate code runs to record the OOM.Adds an alarm-boundary circuit breaker (
agents) as the universal backstop for the case the in-DO budgets can't catch (#1825): a memory-limit reset that bypasses them entirely — thrown before the budget code runs (e.g. boot-time state hydration OOMs), or whose own small writes also OOM under memory pressure. Left unhandled, such an error propagates out ofalarm()and the platform auto-retries the alarm forever, re-running the doomed, billable turn each cycle.Agent.alarm()now intercepts ONLY Durable Object memory-limit resets at the outermost frame — where the heavy turn has unwound and GC has reclaimed its footprint, so the seal/purge writes can land where mid-turn ones OOMed. A durable strike counter tolerates a few resets (newstatic options.maxAlarmMemoryLimitStrikes, default3) — backing off the looping rows so the retry is not a hot loop — then seals the recovery (out_of_memory) and surgically purges only the looping schedule rows, leaving unrelated scheduled tasks intact. A newalarm:memory_limit_resetobservability event is emitted. Everything except memory-limit resets re-throws exactly as before.Also broadens and exports the
isDurableObjectMemoryLimitReset(error)predicate fromagents(a sibling toisDurableObjectCodeUpdateReset/isPlatformTransientError): it now matches the shared"exceeded its memory limit"fragment so truncated/reworded surfacings (observed in real #1825 logs) still classify.#1826
1bbd9bcThanks @threepointone! - Fix neverending chat-recovery retries when a Durable Object isolate runs out of memory mid-turn (#1825).chatRecovery.maxRecoveryWorknow defaults to a generous finite backstop (1000) instead ofInfinity. An isolate that exceeds its memory limit and is reset mid-stream has usually already streamed a little content, which bumps the durable progress counter. On the next wake recovery reads that as forward progress and resets both progress-keyed bounds — the attempt cap (maxAttempts) and the no-progress window (noProgressTimeoutMs) — and because each crash lands inside the alarm-debounce window the attempt counter is pinned too. With the work budget disabled (Infinity), no instrument could ever seal the turn, so recovery re-ran the turn (and its LLM calls) forever. The work meter is the one signal that keeps climbing across such a loop, so a finite default seals a runaway withreason="work_budget_exceeded"instead of looping.Work only accrues from the first interruption until the turn completes, so a normal interrupted turn never approaches the cap. A very long agentic turn that legitimately produces a large amount of content under heavy interruption can raise
maxRecoveryWork(or set it toInfinityto restore the previous fully-unbounded behavior, ideally paired with ashouldKeepRecoveringpredicate that bounds the runaway via real token/cost accounting).@cloudflare/ai-chat@0.9.1
Patch Changes
#1826
1bbd9bcThanks @threepointone! - Add a tight, OOM-specific retry budget to chat recovery so a memory-limit crash loop seals fast and attributably (#1825).When a recovery turn hits a Durable Object memory-limit reset (the isolate exceeded its 128 MB limit), recovery now classifies it as a distinct, deterministic failure rather than a deploy-style transient. A memory reset re-OOMs on re-run (the turn's working set, not the platform, is the cause), so it must NOT be deferred and retried forever like a code-update/connection-lost transient. Each such crash bumps a durable per-incident
oomAttemptscounter; recovery retries a small number of times (newchatRecovery.maxOomRetries, default3) — in case the OOM was a transient spike — then seals withreason="out_of_memory". This is far tighter than the genericmaxRecoveryWorkbackstop because an OOM is attributable and each re-run re-runs the model.This complements the finite
maxRecoveryWorkdefault: the OOM budget is the fast path for memory resets that surface as catchable errors thrown from recovery bookkeeping (e.g. storage/SQL rejections after the reset), whilemaxRecoveryWorkremains a backstop for the hard-kill case where no in-isolate code runs to record the OOM.Adds an alarm-boundary circuit breaker (
agents) as the universal backstop for the case the in-DO budgets can't catch (#1825): a memory-limit reset that bypasses them entirely — thrown before the budget code runs (e.g. boot-time state hydration OOMs), or whose own small writes also OOM under memory pressure. Left unhandled, such an error propagates out ofalarm()and the platform auto-retries the alarm forever, re-running the doomed, billable turn each cycle.Agent.alarm()now intercepts ONLY Durable Object memory-limit resets at the outermost frame — where the heavy turn has unwound and GC has reclaimed its footprint, so the seal/purge writes can land where mid-turn ones OOMed. A durable strike counter tolerates a few resets (newstatic options.maxAlarmMemoryLimitStrikes, default3) — backing off the looping rows so the retry is not a hot loop — then seals the recovery (out_of_memory) and surgically purges only the looping schedule rows, leaving unrelated scheduled tasks intact. A newalarm:memory_limit_resetobservability event is emitted. Everything except memory-limit resets re-throws exactly as before.Also broadens and exports the
isDurableObjectMemoryLimitReset(error)predicate fromagents(a sibling toisDurableObjectCodeUpdateReset/isPlatformTransientError): it now matches the shared"exceeded its memory limit"fragment so truncated/reworded surfacings (observed in real #1825 logs) still classify.#1826
1bbd9bcThanks @threepointone! - Fix neverending chat-recovery retries when a Durable Object isolate runs out of memory mid-turn (#1825).chatRecovery.maxRecoveryWorknow defaults to a generous finite backstop (1000) instead ofInfinity. An isolate that exceeds its memory limit and is reset mid-stream has usually already streamed a little content, which bumps the durable progress counter. On the next wake recovery reads that as forward progress and resets both progress-keyed bounds — the attempt cap (maxAttempts) and the no-progress window (noProgressTimeoutMs) — and because each crash lands inside the alarm-debounce window the attempt counter is pinned too. With the work budget disabled (Infinity), no instrument could ever seal the turn, so recovery re-ran the turn (and its LLM calls) forever. The work meter is the one signal that keeps climbing across such a loop, so a finite default seals a runaway withreason="work_budget_exceeded"instead of looping.Work only accrues from the first interruption until the turn completes, so a normal interrupted turn never approaches the cap. A very long agentic turn that legitimately produces a large amount of content under heavy interruption can raise
maxRecoveryWork(or set it toInfinityto restore the previous fully-unbounded behavior, ideally paired with ashouldKeepRecoveringpredicate that bounds the runaway via real token/cost accounting).@cloudflare/think@0.11.1
Patch Changes
#1826
1bbd9bcThanks @threepointone! - Add a tight, OOM-specific retry budget to chat recovery so a memory-limit crash loop seals fast and attributably (#1825).When a recovery turn hits a Durable Object memory-limit reset (the isolate exceeded its 128 MB limit), recovery now classifies it as a distinct, deterministic failure rather than a deploy-style transient. A memory reset re-OOMs on re-run (the turn's working set, not the platform, is the cause), so it must NOT be deferred and retried forever like a code-update/connection-lost transient. Each such crash bumps a durable per-incident
oomAttemptscounter; recovery retries a small number of times (newchatRecovery.maxOomRetries, default3) — in case the OOM was a transient spike — then seals withreason="out_of_memory". This is far tighter than the genericmaxRecoveryWorkbackstop because an OOM is attributable and each re-run re-runs the model.This complements the finite
maxRecoveryWorkdefault: the OOM budget is the fast path for memory resets that surface as catchable errors thrown from recovery bookkeeping (e.g. storage/SQL rejections after the reset), whilemaxRecoveryWorkremains a backstop for the hard-kill case where no in-isolate code runs to record the OOM.Adds an alarm-boundary circuit breaker (
agents) as the universal backstop for the case the in-DO budgets can't catch (#1825): a memory-limit reset that bypasses them entirely — thrown before the budget code runs (e.g. boot-time state hydration OOMs), or whose own small writes also OOM under memory pressure. Left unhandled, such an error propagates out ofalarm()and the platform auto-retries the alarm forever, re-running the doomed, billable turn each cycle.Agent.alarm()now intercepts ONLY Durable Object memory-limit resets at the outermost frame — where the heavy turn has unwound and GC has reclaimed its footprint, so the seal/purge writes can land where mid-turn ones OOMed. A durable strike counter tolerates a few resets (newstatic options.maxAlarmMemoryLimitStrikes, default3) — backing off the looping rows so the retry is not a hot loop — then seals the recovery (out_of_memory) and surgically purges only the looping schedule rows, leaving unrelated scheduled tasks intact. A newalarm:memory_limit_resetobservability event is emitted. Everything except memory-limit resets re-throws exactly as before.Also broadens and exports the
isDurableObjectMemoryLimitReset(error)predicate fromagents(a sibling toisDurableObjectCodeUpdateReset/isPlatformTransientError): it now matches the shared"exceeded its memory limit"fragment so truncated/reworded surfacings (observed in real #1825 logs) still classify.#1826
1bbd9bcThanks @threepointone! - Fix neverending chat-recovery retries when a Durable Object isolate runs out of memory mid-turn (#1825).chatRecovery.maxRecoveryWorknow defaults to a generous finite backstop (1000) instead ofInfinity. An isolate that exceeds its memory limit and is reset mid-stream has usually already streamed a little content, which bumps the durable progress counter. On the next wake recovery reads that as forward progress and resets both progress-keyed bounds — the attempt cap (maxAttempts) and the no-progress window (noProgressTimeoutMs) — and because each crash lands inside the alarm-debounce window the attempt counter is pinned too. With the work budget disabled (Infinity), no instrument could ever seal the turn, so recovery re-ran the turn (and its LLM calls) forever. The work meter is the one signal that keeps climbing across such a loop, so a finite default seals a runaway withreason="work_budget_exceeded"instead of looping.Work only accrues from the first interruption until the turn completes, so a normal interrupted turn never approaches the cap. A very long agentic turn that legitimately produces a large amount of content under heavy interruption can raise
maxRecoveryWork(or set it toInfinityto restore the previous fully-unbounded behavior, ideally paired with ashouldKeepRecoveringpredicate that bounds the runaway via real token/cost accounting).#1821
de6a695Thanks @threepointone! - Add an opt-in, read-only HTTP fetch capability for Think agents via the new@cloudflare/think/tools/fetchexport and afetchToolsproperty onThink.createFetchTools()generates a generic, allowlistedfetch_urltool plus onefetch_<name>tool per named service-binding/Fetchertarget. It isGET-only with Workers-grounded SSRF defenses (private/loopback/link-local/*.internalblocking, URL normalization, credential rejection), separate download/model/workspace size limits (maxBytes,maxModelChars,response: "workspace"spill), an allowlist-aware redirect policy with cross-origin header stripping, a model header allowlist, and atool:fetchobservability event. Disabled by default.#1823
b58b5a3Thanks @threepointone! - Improve Think's tool-call lifecycle hooks (follow-ups from #1343):beforeToolCall. Tools whoseexecuteis an async generator (async function* execute(...)) now stream their preliminary tool-results to the model even though Think wrapsexecuteto consultbeforeToolCallfirst. Non-streaming tools keep a scalar wrapper, so they never emit a syntheticpreliminarychunk. The non-canonicalasync () => makeIterator()form (aPromise<AsyncIterable>) still collapses to its last yielded value, matching the raw AI SDK.TOOLSgeneric is passed, narrowing onctx.toolNamenow narrowsctx.inputonbeforeToolCalland — new —ctx.outputonafterToolCall's success branch to that tool's inferred output type. Dynamic tools stayunknown. Behavior with the defaultToolSetis unchanged.