Skip to content

fix(context): prevent context overflow by clamping max_tokens before API call#116

Merged
yishuiliunian merged 1 commit intomainfrom
fix/context-overflow-max-tokens-clamp
Apr 16, 2026
Merged

fix(context): prevent context overflow by clamping max_tokens before API call#116
yishuiliunian merged 1 commit intomainfrom
fix/context-overflow-max-tokens-clamp

Conversation

@yishuiliunian
Copy link
Copy Markdown
Contributor

Summary

  • Fix input_tokens + max_tokens > context_window API rejection that permanently blocks the agent
  • Root cause: budget system capped output_reserve at 16K for compaction thresholds but sent 64K max_tokens to API, creating a ~37K token gap
  • Implements 4-layer defense: pre-flight clamp, overflow detection fix, runtime recovery

Changes

Layer 1 — Budget awareness (crates/loopal-context/src/budget.rs)

  • Add max_output_tokens field to ContextBudget (uncapped, for API constraint validation)
  • Add clamp_output_tokens(estimated_input) method for pre-flight max_tokens clamping

Layer 2 — Pre-flight validation (crates/loopal-runtime/src/agent_loop/llm_params.rs)

  • Estimate input tokens (system + tools + messages) before each LLM call
  • Dynamically reduce max_tokens via clamp_output_tokens() when headroom is tight

Layer 3 — Error detection (crates/loopal-provider/src/anthropic/mod.rs, crates/loopal-error/src/helpers.rs)

  • Add "exceed context limit" pattern to ContextOverflow detection (previously classified as generic 400)

Layer 4 — Runtime recovery (crates/loopal-runtime/src/agent_loop/run.rs)

  • Catch ContextOverflow → emergency compact → retry once (prevents permanent agent block)

Test plan

  • bazel build //... --config=clippy — zero warnings
  • bazel build //... --config=rustfmt — passes
  • loopal-context tests (budget clamp, degradation, store) — all pass
  • loopal-error tests (overflow detection patterns) — all pass
  • loopal-provider tests — all pass
  • loopal-runtime tests (241 tests including LLM params) — all pass
  • loopal-tui tests (compact edge cases) — all pass
  • CI passes

…API call

The budget system capped output_reserve at 16K for compaction thresholds
but sent the full max_output_tokens (64K) to the API, creating a ~37K
token gap where input could grow beyond the API's hard constraint
(input + max_tokens <= context_window).

Four-layer defense:
- Add max_output_tokens to ContextBudget with clamp_output_tokens() for
  pre-flight validation against the real API constraint
- Estimate input tokens before each LLM call and dynamically reduce
  max_tokens when headroom is insufficient
- Fix ContextOverflow detection to match "exceed context limit" pattern
  from Anthropic API (previously unrecognized, classified as generic 400)
- Add runtime recovery: catch ContextOverflow, emergency compact, retry
  once instead of permanently blocking the agent
@yishuiliunian yishuiliunian merged commit 2809d65 into main Apr 16, 2026
4 checks passed
@yishuiliunian yishuiliunian deleted the fix/context-overflow-max-tokens-clamp branch April 16, 2026 08:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant