refactor(llm): use litellm params for reasoning support#1990
Conversation
Derive reasoning_effort support from get_supported_openai_params and update Nova 2 Lite expectations in tests. Co-authored-by: openhands <openhands@all-hands.dev>
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Acceptable - Works but introduces risks without adequate safeguards
Core Trade-off: You replaced a simple, explicit list with a runtime dependency on LiteLLM. This is DRY and reduces maintenance, but sacrifices control and introduces failure modes. The old approach was "worse is better" - ugly but bulletproof. The new approach is elegant but brittle.
Verdict: ❌ Needs rework - The bare except Exception: is unacceptable, cache is unbounded, and error paths are untested.
Key Insight: Dynamic lookups are only better than static lists when the external source is more reliable than you are—prove that LiteLLM is.
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Acceptable - Works but introduces risks without adequate safeguards
Core Trade-off: You replaced a simple, explicit list with a runtime dependency on LiteLLM. This is DRY and reduces maintenance, but sacrifices control and introduces failure modes. The old approach was "worse is better" - ugly but bulletproof. The new approach is elegant but brittle.
Verdict: ❌ Needs rework - The bare except Exception: is unacceptable, cache is unbounded, and error paths are untested.
Key Insight: Dynamic lookups are only better than static lists when the external source is more reliable than you are—prove that LiteLLM is.
all-hands-bot
left a comment
There was a problem hiding this comment.
Test comment for line 83
all-hands-bot
left a comment
There was a problem hiding this comment.
Test comment for line 4
all-hands-bot
left a comment
There was a problem hiding this comment.
Test comment for test file line 35
all-hands-bot
left a comment
There was a problem hiding this comment.
Additional Critical Issues
Following up on my initial review with remaining inline comments:
all-hands-bot
left a comment
There was a problem hiding this comment.
Additional Critical Issues
Following up on my initial review with remaining inline comments:
all-hands-bot
left a comment
There was a problem hiding this comment.
🟠 Testing Gaps - Critical Missing Coverage
You added complex logic (prefix stripping, provider splitting, exception handling, LRU cache) but only test the happy path through get_features().
Missing test coverage for _supports_reasoning_effort():
-
Error handling: What happens when
get_supported_openai_paramsraises an exception? Your bareexcept Exception:hides this. -
LRU cache behavior: Does it actually work? Test: Call same model twice, verify only one LiteLLM call made.
-
Edge cases:
- Empty string:
_supports_reasoning_effort("") - None:
_supports_reasoning_effort(None) - Whitespace-only:
_supports_reasoning_effort(" ") - Malformed provider:
_supports_reasoning_effort("a/b/c/d")
- Empty string:
-
Prefix handling: Does
litellm_proxy/prefix stripping work correctly?_supports_reasoning_effort("litellm_proxy/o1")_supports_reasoning_effort("litellm_proxy/openai/o1")
-
Custom provider splitting: Does it handle multiple slashes correctly?
Tests should prove the code works, not just assert it was called. Add unit tests for _supports_reasoning_effort directly (yes, it's private, but it has complex logic that deserves dedicated tests).
Bottom line: Without these tests, you're flying blind. When LiteLLM changes or edge cases hit production, you'll have no safety net.
Co-authored-by: OpenHands Bot <contact@all-hands.dev>
|
@OpenHands The goal of this PR is good, but the implementation is bad. Remove the cached thing. LiteLLM has a supports_reasoning method or something named like that, right? Make the project with uv, find litellm codebase in .venv, investigate. Lets use that for detecting which models support reasoning. Commit, run precommit, push. |
Final summary (new changes since the last PR state)Checklist against the request
Conciseness / extraneous changes check
Branch is clean and fully pushed (no pending local changes). |
|
HUMAN: So my problem here with your code is this: I think litellm does more complex code to find out what the real provider is. We always send to litellm "full model name" which includes provider, so litellm figured it out. We never did extract provider ourselves. Why it matters: apart from the obvious nice cases, there are cases like "openrouter/anthropic/sonnet-4-5", or "bedrock/anthropic/haiku-3-5:v1", or "openai/llama-4" (which is NOT openai, but it is openai-compatible provider, including custom, local, provider). Litellm knows how to deal with them and extract provider correctly. Investigate its code. Find out if litellm has some method we can call so that it parses provider from our model, instead of duplicating it ourselves. ANSWER: I investigated LiteLLM 1.80.10 (the version in this repo's What LiteLLM uses internally
Source (from model, custom_llm_provider, _, _ = litellm.get_llm_provider(
model=model, custom_llm_provider=custom_llm_provider
)So the intended public entrypoint for parsing a "full model name" is:
It returns a tuple:
and the implementation contains a lot of special-case logic beyond a naive
Implication for our codeWe should not pre-split the provider ourselves just to call Instead we can pass the full model string (after stripping our own
LiteLLM will internally call About the examples you mentionedYour examples like:
are exactly the kind of thing (As a side note: when I ran |
… <openhands@all-hands.dev>
|
@OpenHands understand this pr, then merge main branch into it and fix conflicts; push. |
|
I'm on it! enyst can track my progress at all-hands.dev |
Co-authored-by: openhands <openhands@all-hands.dev>
Final summarySince the last summary, I completed the requested PR maintenance work:
Conciseness check
ResultThe request was fully addressed: |
Python API breakage checks — ✅ PASSEDResult: ✅ PASSED |
REST API breakage checks (OpenAPI) — ✅ PASSEDResult: ✅ PASSED |
|
@OpenHands /codereview-roasted |
|
I'm on it! enyst can track my progress at all-hands.dev |
enyst
left a comment
There was a problem hiding this comment.
🟢 Good taste - this is the right simplification.
You removed the dumb duplicate model list and delegated provider parsing back to LiteLLM, which is exactly where that logic belongs. The litellm_proxy/ normalization is the only special case you actually own, and keeping that wrapper handling local is good taste.
I don’t see an important correctness issue in the current diff:
- reasoning support now follows LiteLLM’s provider/model resolution instead of our partial reimplementation
- the Gemini 2.5 Pro default is preserved
- tests cover the real behavior change, including provider-prefixed and
litellm_proxy/model IDs
I also re-ran the targeted suite locally:
uv run pytest -q tests/sdk/llm/test_model_features.py tests/sdk/llm/test_chat_options.py- result: 142 passed
VERDICT: ✅ Worth merging from a code-quality standpoint.
KEY INSIGHT: The real fix here is not the helper function — it’s deleting our fake source of truth and letting LiteLLM answer its own capability question.
One caveat: this does change LLM capability detection / request shaping, so I’m leaving a comment rather than an approval and would still want a human maintainer to make the final merge call after whatever lightweight eval confidence they consider sufficient.
Final summarySince the last summary, I completed the requested PR comment handling work:
Review conclusion I posted
Conciseness check
Overall, the request has been fully addressed. |
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Acceptable - Solid simplification but with pragmatic concerns
The core refactor is good taste: you deleted the duplicate model list and delegated to LiteLLM's capability metadata. That's the right call. However, the Gemini special case introduces defensive logic that feels fragile, and the silent exception handling makes debugging harder.
|
@OpenHands use gh to read the AI review threads (and my reply to one of them, do that), and fix the others if you find them reasonable. Push. Babysit PR until CI is green. Then ask for re-review from this bot, the same bot. |
|
I'm on it! enyst can track my progress at all-hands.dev |
Trust LiteLLM's current Gemini reasoning support handling and drop the obsolete Gemini 2.5 default-low special case. Also remove the redundant wrapper around LiteLLM's own supports_reasoning error handling and update the targeted chat options regression test. Co-authored-by: openhands <openhands@all-hands.dev>
all-hands-bot
left a comment
There was a problem hiding this comment.
🟢 Good taste - Clean refactor that eliminates duplicate model lists.
You delegated reasoning capability detection to LiteLLM's metadata, which is exactly where that logic belongs. The behavior changes (nova-2-lite, OpenRouter models) align with LiteLLM's current capabilities and are documented in tests.
| tests/integration/outputs/ | ||
| tests/integration/api_compliance/outputs/ | ||
|
|
||
| # Agent-generated temp |
There was a problem hiding this comment.
🟢 Acceptable: This addition is fine but appears unrelated to the reasoning effort refactor. Minor organizational nit - consider keeping PRs focused on a single concern when possible.
|
No new changes since my last summary. Final status
New changes included in that final round
Verification
Conciseness
PR: #1990 |
|
Evaluation Triggered
|
🎉 Swebench Evaluation CompleteEvaluation: 📊 Results
🔗 Links |
Summary
Testing
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:4704427-pythonRun
All tags pushed for this build
About Multi-Architecture Support
4704427-python) is a multi-arch manifest supporting both amd64 and arm644704427-python-amd64) are also available if needed