refactor(llm): use litellm params for reasoning support by enyst · Pull Request #1990 · OpenHands/software-agent-sdk

enyst · 2026-02-10T18:12:16Z

Summary

derive reasoning_effort support from litellm and remove hardcoded list
normalize litellm_proxy-prefixed models before checking supported params
expectations should match the old reasoning models hardcoding

Testing

uv run pre-commit run --files openhands-sdk/openhands/sdk/llm/utils/model_features.py
uv run pre-commit run --files tests/sdk/llm/test_model_features.py

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.13-nodejs22`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:4704427-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-4704427-python \
  ghcr.io/openhands/agent-server:4704427-python

All tags pushed for this build

ghcr.io/openhands/agent-server:4704427-golang-amd64
ghcr.io/openhands/agent-server:4704427-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:4704427-golang-arm64
ghcr.io/openhands/agent-server:4704427-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:4704427-java-amd64
ghcr.io/openhands/agent-server:4704427-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:4704427-java-arm64
ghcr.io/openhands/agent-server:4704427-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:4704427-python-amd64
ghcr.io/openhands/agent-server:4704427-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-amd64
ghcr.io/openhands/agent-server:4704427-python-arm64
ghcr.io/openhands/agent-server:4704427-nikolaik_s_python-nodejs_tag_python3.13-nodejs22-arm64
ghcr.io/openhands/agent-server:4704427-golang
ghcr.io/openhands/agent-server:4704427-java
ghcr.io/openhands/agent-server:4704427-python

About Multi-Architecture Support

Each variant tag (e.g., 4704427-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., 4704427-python-amd64) are also available if needed

Derive reasoning_effort support from get_supported_openai_params and update Nova 2 Lite expectations in tests. Co-authored-by: openhands <openhands@all-hands.dev>

all-hands-bot

🟡 Acceptable - Works but introduces risks without adequate safeguards

Core Trade-off: You replaced a simple, explicit list with a runtime dependency on LiteLLM. This is DRY and reduces maintenance, but sacrifices control and introduces failure modes. The old approach was "worse is better" - ugly but bulletproof. The new approach is elegant but brittle.

Verdict: ❌ Needs rework - The bare except Exception: is unacceptable, cache is unbounded, and error paths are untested.

Key Insight: Dynamic lookups are only better than static lists when the external source is more reliable than you are—prove that LiteLLM is.

openhands-sdk/openhands/sdk/llm/utils/model_features.py

all-hands-bot

🟡 Acceptable - Works but introduces risks without adequate safeguards

Core Trade-off: You replaced a simple, explicit list with a runtime dependency on LiteLLM. This is DRY and reduces maintenance, but sacrifices control and introduces failure modes. The old approach was "worse is better" - ugly but bulletproof. The new approach is elegant but brittle.

Verdict: ❌ Needs rework - The bare except Exception: is unacceptable, cache is unbounded, and error paths are untested.

Key Insight: Dynamic lookups are only better than static lists when the external source is more reliable than you are—prove that LiteLLM is.

openhands-sdk/openhands/sdk/llm/utils/model_features.py

all-hands-bot

Test comment for line 83

openhands-sdk/openhands/sdk/llm/utils/model_features.py

all-hands-bot

Test comment for line 4

openhands-sdk/openhands/sdk/llm/utils/model_features.py

all-hands-bot

Test comment for test file line 35

tests/sdk/llm/test_model_features.py

all-hands-bot

Additional Critical Issues

Following up on my initial review with remaining inline comments:

openhands-sdk/openhands/sdk/llm/utils/model_features.py

tests/sdk/llm/test_model_features.py

all-hands-bot

Additional Critical Issues

Following up on my initial review with remaining inline comments:

openhands-sdk/openhands/sdk/llm/utils/model_features.py

tests/sdk/llm/test_model_features.py

all-hands-bot

🟠 Testing Gaps - Critical Missing Coverage

You added complex logic (prefix stripping, provider splitting, exception handling, LRU cache) but only test the happy path through get_features().

Missing test coverage for _supports_reasoning_effort():

Error handling: What happens when get_supported_openai_params raises an exception? Your bare except Exception: hides this.
LRU cache behavior: Does it actually work? Test: Call same model twice, verify only one LiteLLM call made.
Edge cases:
- Empty string: _supports_reasoning_effort("")
- None: _supports_reasoning_effort(None)
- Whitespace-only: _supports_reasoning_effort(" ")
- Malformed provider: _supports_reasoning_effort("a/b/c/d")
Prefix handling: Does litellm_proxy/ prefix stripping work correctly?
- _supports_reasoning_effort("litellm_proxy/o1")
- _supports_reasoning_effort("litellm_proxy/openai/o1")
Custom provider splitting: Does it handle multiple slashes correctly?

Tests should prove the code works, not just assert it was called. Add unit tests for _supports_reasoning_effort directly (yes, it's private, but it has complex logic that deserves dedicated tests).

Bottom line: Without these tests, you're flying blind. When LiteLLM changes or edge cases hit production, you'll have no safety net.

Co-authored-by: OpenHands Bot <contact@all-hands.dev>

enyst · 2026-02-11T21:34:07Z

@OpenHands The goal of this PR is good, but the implementation is bad. Remove the cached thing.

LiteLLM has a supports_reasoning method or something named like that, right? Make the project with uv, find litellm codebase in .venv, investigate.

Lets use that for detecting which models support reasoning. Commit, run precommit, push.

openhands-ai · 2026-02-11T21:37:18Z

Final summary (new changes since the last PR state)

Checklist against the request

Removed the cached thing: deleted the @lru_cache-based reasoning-support detection.
Investigated LiteLLM in the uv-managed .venv: confirmed litellm.utils.supports_reasoning exists (and inspected its signature/source).
Used LiteLLM’s built-in method for detection: _supports_reasoning_effort() now delegates to supports_reasoning(...) after normalizing the model string (including litellm_proxy/ prefix handling).
Committed the change: refactor(llm): use litellm supports_reasoning (commit c6847155).
Ran pre-commit: uv run pre-commit run --all-files passes.
Pushed to origin/cleanup-reasoning-models (PR updated).

Conciseness / extraneous changes check

The diff is limited to one file: openhands-sdk/openhands/sdk/llm/utils/model_features.py.
No unrelated refactors/tests/docs were changed.

Branch is clean and fully pushed (no pending local changes).

View full conversation

openhands-sdk/openhands/sdk/llm/utils/model_features.py

enyst · 2026-02-12T01:34:37Z

HUMAN:

So my problem here with your code is this: I think litellm does more complex code to find out what the real provider is. We always send to litellm "full model name" which includes provider, so litellm figured it out. We never did extract provider ourselves.

Why it matters: apart from the obvious nice cases, there are cases like "openrouter/anthropic/sonnet-4-5", or "bedrock/anthropic/haiku-3-5:v1", or "openai/llama-4" (which is NOT openai, but it is openai-compatible provider, including custom, local, provider). Litellm knows how to deal with them and extract provider correctly.

Investigate its code. Find out if litellm has some method we can call so that it parses provider from our model, instead of duplicating it ourselves.

ANSWER:

I investigated LiteLLM 1.80.10 (the version in this repo's uv .venv). There is a built-in provider+model parser, and supports_reasoning() already uses it.

What LiteLLM uses internally

litellm.utils.supports_reasoning() is implemented as:

supports_reasoning() → _supports_factory(model, custom_llm_provider, key="supports_reasoning")
_supports_factory() calls litellm.get_llm_provider(model=..., custom_llm_provider=...) first.

Source (from .venv/site-packages/litellm/utils.py):

model, custom_llm_provider, _, _ = litellm.get_llm_provider(
    model=model, custom_llm_provider=custom_llm_provider
)

So the intended public entrypoint for parsing a "full model name" is:

litellm.get_llm_provider(...)

It returns a tuple:

(model, custom_llm_provider, dynamic_api_key, api_base)

and the implementation contains a lot of special-case logic beyond a naive split("/", 1):

supports provider prefixes in the model string
handles OpenAI-compatible endpoints via api_base
has provider-specific routing rules (azure non-openai models, anthropic_text vs anthropic, cohere_chat vs cohere, etc.)

Implication for our code

We should not pre-split the provider ourselves just to call supports_reasoning().

Instead we can pass the full model string (after stripping our own litellm_proxy/ prefix if needed), and call:

supports_reasoning(model=full_model_string, custom_llm_provider=None)

LiteLLM will internally call get_llm_provider() and do the complex parsing.

About the examples you mentioned

Your examples like:

openrouter/anthropic/sonnet-4-5
bedrock/anthropic/haiku-3-5:v1
openai/llama-4

are exactly the kind of thing get_llm_provider() is meant to normalize (and this is the same function supports_reasoning() relies on). So the safest approach is to delegate parsing to LiteLLM by passing the full model string.

(As a side note: when I ran supports_reasoning() on those exact strings locally, it returned False for them in 1.80.10, but that's about capability metadata, not parsing — parsing is still happening via get_llm_provider().)

github-actions · 2026-02-14T00:51:25Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
openhands-sdk/openhands/sdk/llm/options
chat_options.py	40	3	92%	74–76
openhands-sdk/openhands/sdk/llm/utils
model_features.py	48	1	97%	34
TOTAL	20344	5917	70%

… <openhands@all-hands.dev>

enyst · 2026-03-17T22:29:29Z

@OpenHands understand this pr, then merge main branch into it and fix conflicts; push.

openhands-ai · 2026-03-17T22:29:51Z

I'm on it! enyst can track my progress at all-hands.dev

Co-authored-by: openhands <openhands@all-hands.dev>

openhands-ai · 2026-03-17T22:32:29Z

Final summary

Since the last summary, I completed the requested PR maintenance work:

Understood the PR context by reviewing PR refactor(llm): use litellm params for reasoning support #1990 metadata and the branch diff against main.
Merged main into cleanup-reasoning-models.
Resolved merge conflicts in:
- .gitignore
- openhands-sdk/openhands/sdk/llm/utils/model_features.py
Preserved the PR’s intended behavior:
- kept the LiteLLM-based supports_reasoning(...) reasoning detection
- kept the PR’s related LLM option/test changes
- kept both .agent_tmp/ and tests/integration/api_compliance/outputs/ in .gitignore
Validated the result with:
- make build
- targeted uv run pre-commit run --files ...
- targeted uv run pytest tests/sdk/llm/test_model_features.py tests/sdk/llm/test_chat_options.py
Committed and pushed the merge resolution to the PR branch:
- commit: 49ec97d1
- branch: cleanup-reasoning-models

Conciseness check

The new changes were concise and limited to merge/conflict resolution plus validation.
I did not introduce unrelated edits beyond what was necessary to complete the merge cleanly.

Result

The request was fully addressed: main has been merged into the PR branch, conflicts were fixed, validation passed, and the updated branch was pushed to GitHub.

github-actions · 2026-03-17T22:32:34Z

Python API breakage checks — ✅ PASSED

Result: ✅ PASSED

Action log

github-actions · 2026-03-17T22:32:47Z

REST API breakage checks (OpenAPI) — ✅ PASSED

Result: ✅ PASSED

Action log

enyst · 2026-03-19T14:55:15Z

@OpenHands /codereview-roasted

openhands-ai · 2026-03-19T14:55:42Z

I'm on it! enyst can track my progress at all-hands.dev

enyst

🟢 Good taste - this is the right simplification.

You removed the dumb duplicate model list and delegated provider parsing back to LiteLLM, which is exactly where that logic belongs. The litellm_proxy/ normalization is the only special case you actually own, and keeping that wrapper handling local is good taste.

I don’t see an important correctness issue in the current diff:

reasoning support now follows LiteLLM’s provider/model resolution instead of our partial reimplementation
the Gemini 2.5 Pro default is preserved
tests cover the real behavior change, including provider-prefixed and litellm_proxy/ model IDs

I also re-ran the targeted suite locally:

uv run pytest -q tests/sdk/llm/test_model_features.py tests/sdk/llm/test_chat_options.py
result: 142 passed

VERDICT: ✅ Worth merging from a code-quality standpoint.

KEY INSIGHT: The real fix here is not the helper function — it’s deleting our fake source of truth and letting LiteLLM answer its own capability question.

One caveat: this does change LLM capability detection / request shaping, so I’m leaving a comment rather than an approval and would still want a human maintainer to make the final merge call after whatever lightweight eval confidence they consider sufficient.

openhands-ai · 2026-03-19T14:58:51Z

Final summary

Since the last summary, I completed the requested PR comment handling work:

Reviewed PR refactor(llm): use litellm params for reasoning support #1990 in context: checked the current branch, PR metadata, changed files, prior comments, and relevant AGENTS.md guidance.
Validated the current state locally by re-running the targeted tests:
- uv run pytest -q tests/sdk/llm/test_model_features.py tests/sdk/llm/test_chat_options.py
- Result: 142 passed
Processed the /codereview-roasted comment and posted a GitHub comment review on PR refactor(llm): use litellm params for reasoning support #1990.
Followed the repo review policy: I left a comment, not an approval, because this PR changes LLM capability detection / request shaping and should get a human maintainer’s final merge decision.

Review conclusion I posted

I did not find an important correctness issue in the current diff.
I noted that the PR is a good simplification:
- it removes the duplicate local reasoning-support list,
- delegates provider/model parsing back to LiteLLM,
- keeps the litellm_proxy/ handling as the only local special case,
- and preserves the Gemini 2.5 Pro default behavior.

Conciseness check

No code changes were made in this step.
The work was limited to review + validation + posting the PR response.
There are no extraneous repository changes to revert.

Overall, the request has been fully addressed.

all-hands-bot

🟡 Acceptable - Solid simplification but with pragmatic concerns

The core refactor is good taste: you deleted the duplicate model list and delegated to LiteLLM's capability metadata. That's the right call. However, the Gemini special case introduces defensive logic that feels fragile, and the silent exception handling makes debugging harder.

openhands-sdk/openhands/sdk/llm/utils/model_features.py

openhands-sdk/openhands/sdk/llm/options/chat_options.py

tests/sdk/llm/test_chat_options.py

enyst · 2026-03-19T19:34:52Z

@OpenHands use gh to read the AI review threads (and my reply to one of them, do that), and fix the others if you find them reasonable. Push. Babysit PR until CI is green. Then ask for re-review from this bot, the same bot.

openhands-ai · 2026-03-19T19:35:14Z

I'm on it! enyst can track my progress at all-hands.dev

Trust LiteLLM's current Gemini reasoning support handling and drop the obsolete Gemini 2.5 default-low special case. Also remove the redundant wrapper around LiteLLM's own supports_reasoning error handling and update the targeted chat options regression test. Co-authored-by: openhands <openhands@all-hands.dev>

all-hands-bot

🟢 Good taste - Clean refactor that eliminates duplicate model lists.

You delegated reasoning capability detection to LiteLLM's metadata, which is exactly where that logic belongs. The behavior changes (nova-2-lite, OpenRouter models) align with LiteLLM's current capabilities and are documented in tests.

all-hands-bot · 2026-03-19T19:52:13Z

.gitignore

 tests/integration/outputs/
 tests/integration/api_compliance/outputs/
+
+# Agent-generated temp


🟢 Acceptable: This addition is fine but appears unrelated to the reasoning effort refactor. Minor organizational nit - consider keeping PRs focused on a single concern when possible.

openhands-ai · 2026-03-19T19:53:06Z

No new changes since my last summary.

Final status

Read the AI review threads with gh, including enyst’s reply
Fixed the remaining reasonable issues
Pushed the changes to cleanup-reasoning-models
Replied to and resolved the addressed review threads
Babysat the PR until CI was fully green
Requested re-review from the same bot (all-hands-bot)

New changes included in that final round

Removed the stale Gemini 2.5 reasoning_effort="low" workaround from chat_options.py
Removed the redundant outer except Exception around LiteLLM supports_reasoning() in model_features.py
Updated the Gemini chat-options test to match the new behavior
Commit pushed: 485f29e7 (fix(llm): remove stale gemini reasoning workaround)

Verification

Targeted pre-commit checks passed on all edited files
Targeted pytest for test_model_features.py and test_chat_options.py passed
PR checks are green
The OpenHands re-review workflow completed successfully after requesting all-hands-bot

Conciseness

Changes are concise and directly related to the review feedback
No extraneous file changes remain
Working tree is clean

PR: #1990

github-actions · 2026-03-19T20:03:57Z

Evaluation Triggered

Trigger: Label 'run-eval-1' on PR refactor(llm): use litellm params for reasoning support #1990
SDK: 485f29e
Eval limit: 1
Models: claude-sonnet-4-5-20250929

all-hands-bot · 2026-03-19T20:25:00Z

🎉 Swebench Evaluation Complete

Evaluation: 23314477099-claude-son
Model: litellm_proxy/claude-sonnet-4-5-20250929
Dataset: benchmark (test)
Commit: 485f29e748253eac098d60574ffab0be0173db8d
Timestamp: 26-03-19-20-24
Triggered by: @engel Nyst

📊 Results

Total instances: 500
Submitted instances: 1
Resolved instances: 1
Unresolved instances: 0
Empty patch instances: 0
Error instances: 0
Eval limit: 1
Success rate: 1/1 (100.0%)

🔗 Links

Full Archive

refactor(llm): use litellm params for reasoning support

cfdec3b

Derive reasoning_effort support from get_supported_openai_params and update Nova 2 Lite expectations in tests. Co-authored-by: openhands <openhands@all-hands.dev>

all-hands-bot reviewed Feb 10, 2026

View reviewed changes

openhands-sdk/openhands/sdk/llm/utils/model_features.py Outdated Show resolved Hide resolved

all-hands-bot reviewed Feb 10, 2026

View reviewed changes

openhands-sdk/openhands/sdk/llm/utils/model_features.py Outdated Show resolved Hide resolved

This comment was marked as outdated.

Sign in to view

all-hands-bot reviewed Feb 10, 2026

View reviewed changes

openhands-sdk/openhands/sdk/llm/utils/model_features.py Outdated Show resolved Hide resolved

all-hands-bot reviewed Feb 10, 2026

View reviewed changes

openhands-sdk/openhands/sdk/llm/utils/model_features.py Outdated Show resolved Hide resolved

all-hands-bot reviewed Feb 10, 2026

View reviewed changes

tests/sdk/llm/test_model_features.py Show resolved Hide resolved

all-hands-bot reviewed Feb 10, 2026

View reviewed changes

This comment was marked as outdated.

Sign in to view

all-hands-bot reviewed Feb 10, 2026

View reviewed changes

Update openhands-sdk/openhands/sdk/llm/utils/model_features.py

626320c

Co-authored-by: OpenHands Bot <contact@all-hands.dev>

refactor(llm): use litellm supports_reasoning

c684715

enyst commented Feb 11, 2026

View reviewed changes

openhands-sdk/openhands/sdk/llm/utils/model_features.py Outdated Show resolved Hide resolved

enyst marked this pull request as draft February 11, 2026 21:42

OpenHands deleted a comment from openhands-ai bot Feb 12, 2026

enyst added 6 commits February 12, 2026 01:47

refactor(llm): let litellm parse provider for reasoning support

f5b7410

test(llm): cover nested provider model names

c13765b

test(llm): assert sonnet 4.5 reasoning support

994506a

test(llm): add openrouter kimi reasoning expectations

43b78e0

test(llm): cover deepseek reasoning models

f7e80ab

test(llm): add openrouter + proxy reasoning models

8d2f18d

OpenHands deleted a comment from openhands-ai bot Feb 12, 2026

enyst added 3 commits February 12, 2026 06:00

Merge branch 'main' into cleanup-reasoning-models

13ccccc

Merge branch 'main' into cleanup-reasoning-models

830e29b

Merge branch 'main' into cleanup-reasoning-models

1cf4fa1

test(llm): use gemini-2.5-pro model name\n\nCo-authored-by: openhands…

0b5ce33

… <openhands@all-hands.dev>

enyst added behavior-initiative This is related to the system prompt sections and LLM steering. and removed behavior-initiative This is related to the system prompt sections and LLM steering. labels Feb 14, 2026

OpenHands deleted a comment from all-hands-bot Mar 17, 2026

Merge branch 'main' into cleanup-reasoning-models

49ec97d

Co-authored-by: openhands <openhands@all-hands.dev>

enyst commented Mar 19, 2026

View reviewed changes

Merge branch 'main' into cleanup-reasoning-models

d9cd671

enyst marked this pull request as ready for review March 19, 2026 19:23

all-hands-bot reviewed Mar 19, 2026

View reviewed changes

openhands-sdk/openhands/sdk/llm/utils/model_features.py Outdated Show resolved Hide resolved

openhands-sdk/openhands/sdk/llm/options/chat_options.py Outdated Show resolved Hide resolved

tests/sdk/llm/test_chat_options.py Show resolved Hide resolved

enyst requested a review from all-hands-bot March 19, 2026 19:50

all-hands-bot approved these changes Mar 19, 2026

View reviewed changes

enyst added the run-eval-1 Runs evaluation on 1 SWE-bench instance label Mar 19, 2026

enyst merged commit 26db836 into main Mar 19, 2026
53 of 54 checks passed

enyst deleted the cleanup-reasoning-models branch March 19, 2026 20:37

Conversation

enyst commented Feb 10, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

🟡 Acceptable - Works but introduces risks without adequate safeguards

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

🟡 Acceptable - Works but introduces risks without adequate safeguards

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Additional Critical Issues

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Additional Critical Issues

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

🟠 Testing Gaps - Critical Missing Coverage

Uh oh!

enyst commented Feb 11, 2026

Uh oh!

openhands-ai bot commented Feb 11, 2026

Final summary (new changes since the last PR state)

Checklist against the request

Conciseness / extraneous changes check

Uh oh!

Uh oh!

enyst commented Feb 12, 2026

What LiteLLM uses internally

Implication for our code

About the examples you mentioned

Uh oh!

github-actions bot commented Feb 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

enyst commented Mar 17, 2026

Uh oh!

openhands-ai bot commented Mar 17, 2026

Uh oh!

openhands-ai bot commented Mar 17, 2026

Final summary

Conciseness check

Result

Uh oh!

github-actions bot commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Python API breakage checks — ✅ PASSED

enyst commented Feb 10, 2026 •

edited by github-actions bot

Loading

github-actions bot commented Feb 14, 2026 •

edited

Loading

github-actions bot commented Mar 17, 2026 •

edited

Loading

github-actions bot commented Mar 17, 2026 •

edited

Loading