fix(context): prevent context overflow by clamping max_tokens before API call#116
Merged
yishuiliunian merged 1 commit intomainfrom Apr 16, 2026
Merged
Conversation
…API call The budget system capped output_reserve at 16K for compaction thresholds but sent the full max_output_tokens (64K) to the API, creating a ~37K token gap where input could grow beyond the API's hard constraint (input + max_tokens <= context_window). Four-layer defense: - Add max_output_tokens to ContextBudget with clamp_output_tokens() for pre-flight validation against the real API constraint - Estimate input tokens before each LLM call and dynamically reduce max_tokens when headroom is insufficient - Fix ContextOverflow detection to match "exceed context limit" pattern from Anthropic API (previously unrecognized, classified as generic 400) - Add runtime recovery: catch ContextOverflow, emergency compact, retry once instead of permanently blocking the agent
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
input_tokens + max_tokens > context_windowAPI rejection that permanently blocks the agentChanges
Layer 1 — Budget awareness (
crates/loopal-context/src/budget.rs)max_output_tokensfield toContextBudget(uncapped, for API constraint validation)clamp_output_tokens(estimated_input)method for pre-flight max_tokens clampingLayer 2 — Pre-flight validation (
crates/loopal-runtime/src/agent_loop/llm_params.rs)max_tokensviaclamp_output_tokens()when headroom is tightLayer 3 — Error detection (
crates/loopal-provider/src/anthropic/mod.rs,crates/loopal-error/src/helpers.rs)"exceed context limit"pattern to ContextOverflow detection (previously classified as generic 400)Layer 4 — Runtime recovery (
crates/loopal-runtime/src/agent_loop/run.rs)ContextOverflow→ emergency compact → retry once (prevents permanent agent block)Test plan
bazel build //... --config=clippy— zero warningsbazel build //... --config=rustfmt— passes