Skip to content

fix: suppress response_model during native tool loop for non-OpenAI providers#5767

Open
HrushiYadav wants to merge 3 commits into
crewAIInc:mainfrom
HrushiYadav:fix/native-tools-suppress-response-model
Open

fix: suppress response_model during native tool loop for non-OpenAI providers#5767
HrushiYadav wants to merge 3 commits into
crewAIInc:mainfrom
HrushiYadav:fix/native-tools-suppress-response-model

Conversation

@HrushiYadav
Copy link
Copy Markdown

@HrushiYadav HrushiYadav commented May 11, 2026

Fixes #5472.

Since v1.9.0, response_model is passed alongside tools on every iteration of the native tool loop in both CrewAgentExecutor and the new AgentExecutor. Providers like Gemini and Anthropic treat response_format as higher priority than tools, so the LLM skips tool calls entirely and returns structured output on the first call.

Changed both _invoke_loop_native_tools (CrewAgentExecutor) and call_llm_native_tools (AgentExecutor) to pass response_model=None during tool iterations, then make one final extraction call without tools once the loop produces a text answer. The new AgentExecutor already uses this pattern correctly in its Plan-and-Execute synthesis step - this brings the native tools path in line with it. Also applied the same fix to the async path in CrewAgentExecutor.

Summary by CodeRabbit

  • Bug Fixes
    • Improved tool invocation reliability when structured output is enabled. The system now correctly prioritizes tool calls over structured formatting and applies the schema after tool execution completes for more predictable responses.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 11, 2026

📝 Walkthrough

Walkthrough

The PR defers application of response_model until after native tool loops: calls while tools are active pass response_model=None; once the tool loop yields final text and response_model is set, a final LLM call (no tools) applies the schema and the executor constructs the AgentFinish.

Changes

Response Model Post-Tool Application

Layer / File(s) Summary
Tool Loop Response Model Suppression
lib/crewai/src/crewai/agents/crew_agent_executor.py, lib/crewai/src/crewai/experimental/agent_executor.py
Comments added explaining suppression; LLM calls during native-tools loops now pass response_model=None instead of self.response_model (sync + async, core and experimental).
Post-Tool Schema Application
lib/crewai/src/crewai/agents/crew_agent_executor.py, lib/crewai/src/crewai/experimental/agent_executor.py
When the agent finishes with a string and response_model is configured, perform a final LLM call with no tools to apply the schema; if extraction yields a BaseModel, serialize to JSON text and produce AgentFinish, otherwise use the raw string. Callback/history updates are applied accordingly (sync + async).

Sequence Diagram(s)

sequenceDiagram
  participant AgentExecutor
  participant LLM
  participant Tools

  AgentExecutor->>LLM: get_llm_response(messages, tools=Tools, response_model=None)
  alt LLM returns tool call
    LLM->>AgentExecutor: tool_call
    AgentExecutor->>Tools: run tool_call
    Tools-->>AgentExecutor: tool_result
    AgentExecutor->>LLM: get_llm_response(updated_messages, tools=Tools, response_model=None)
  else LLM returns final string
    LLM-->>AgentExecutor: final_text
  end
  alt response_model is configured
    AgentExecutor->>LLM: get_llm_response(messages_with_final_text, tools=None, response_model=Model)
    LLM-->>AgentExecutor: BaseModel or string
    AgentExecutor-->>AgentExecutor: wrap into AgentFinish (JSON text if BaseModel else raw string)
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

size/M

Suggested reviewers

  • greysonlalonde

Poem

🐰 I hopped where tools must run, not stall,

schemas waited till the final call.
Loops danced free with tools in flight,
then structured words made the answer right. ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the primary change: suppressing response_model during native tool loops for non-OpenAI providers to fix tool skipping.
Linked Issues check ✅ Passed The PR fully addresses the linked issue #5472 by implementing the recommended solution: suppressing response_model during tool-calling iterations and applying it post-loop via a final extraction call.
Out of Scope Changes check ✅ Passed All code changes are scoped to fixing the response_model handling in native tool loops for both sync and async paths, directly addressing the linked issue objective.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
lib/crewai/src/crewai/agents/crew_agent_executor.py (1)

475-520: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Async native-tools flow still has the original regression

The sync path now suppresses response_model during tool iterations, but CrewAgentExecutor._ainvoke_loop_native_tools still sends response_model=self.response_model (Line 1327). Async agents can still skip tools on non-OpenAI providers, so this fix is only partial.

Suggested parity fix for async native-tools loop
@@ async def _ainvoke_loop_native_tools(self) -> AgentFinish:
-                answer = await aget_llm_response(
+                answer = await aget_llm_response(
                     llm=cast("BaseLLM", self.llm),
                     messages=self.messages,
                     callbacks=self.callbacks,
                     printer=PRINTER,
                     tools=openai_tools,
                     available_functions=None,
                     from_task=self.task,
                     from_agent=self.agent,
-                    response_model=self.response_model,
+                    response_model=None,
                     executor_context=self,
                     verbose=self.agent.verbose,
                 )
@@
                 if isinstance(answer, str):
+                    if self.response_model is not None:
+                        enforce_rpm_limit(self.request_within_rpm_limit)
+                        answer = await aget_llm_response(
+                            llm=cast("BaseLLM", self.llm),
+                            messages=self.messages,
+                            callbacks=self.callbacks,
+                            printer=PRINTER,
+                            from_task=self.task,
+                            from_agent=self.agent,
+                            response_model=self.response_model,
+                            executor_context=self,
+                            verbose=self.agent.verbose,
+                        )
+
+                    if isinstance(answer, BaseModel):
+                        output_json = answer.model_dump_json()
+                        formatted_answer = AgentFinish(
+                            thought="",
+                            output=answer,
+                            text=output_json,
+                        )
+                        await self._ainvoke_step_callback(formatted_answer)
+                        self._append_message(output_json)
+                        self._show_logs(formatted_answer)
+                        return formatted_answer
+
+                    answer_str = answer if isinstance(answer, str) else str(answer)
                     formatted_answer = AgentFinish(
                         thought="",
-                        output=answer,
-                        text=answer,
+                        output=answer_str,
+                        text=answer_str,
                     )
                     await self._ainvoke_step_callback(formatted_answer)
-                    self._append_message(answer)
+                    self._append_message(answer_str)
                     self._show_logs(formatted_answer)
                     return formatted_answer
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/crewai/src/crewai/agents/crew_agent_executor.py` around lines 475 - 520,
The async native-tools loop still passes self.response_model into
get_llm_response, so change _ainvoke_loop_native_tools to mirror the sync path:
when invoking get_llm_response inside the tool-iteration loop, set
response_model=None (use get_llm_response(..., response_model=None, ...)),
detect tool-call lists with _is_tool_call_list and handle them via
_handle_native_tool_calls as before, and only after the tool loop completes make
one final async get_llm_response call with response_model=self.response_model to
apply the schema; ensure you update the calls to get_llm_response and preserve
existing arguments (callbacks, printer, executor_context, verbose) and flow
control.
🧹 Nitpick comments (1)
lib/crewai/src/crewai/agents/crew_agent_executor.py (1)

509-520: ⚡ Quick win

Rate-limit the post-loop extraction call too

Line 510 adds a second LLM call in the same iteration, but there is no second enforce_rpm_limit(...) before it. That can bypass request throttling accounting.

Suggested guard before the extraction call
                     if self.response_model is not None:
+                        enforce_rpm_limit(self.request_within_rpm_limit)
                         answer = get_llm_response(
                             llm=cast("BaseLLM", self.llm),
                             messages=self.messages,
                             callbacks=self.callbacks,
                             printer=PRINTER,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/crewai/src/crewai/agents/crew_agent_executor.py` around lines 509 - 520,
The post-loop extraction call using get_llm_response (when self.response_model
is not None) is missing the same rate-limit guard used earlier; before invoking
get_llm_response in crew_agent_executor.py add a call to enforce_rpm_limit(...)
with the same parameters used for the main LLM call so this second request is
counted and throttled too (keep the call in the same conditional branch that
checks self.response_model and use the same executor/context values such as
self.llm, self.task, and self.agent to mirror the existing rate-limit
invocation).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@lib/crewai/src/crewai/agents/crew_agent_executor.py`:
- Around line 475-520: The async native-tools loop still passes
self.response_model into get_llm_response, so change _ainvoke_loop_native_tools
to mirror the sync path: when invoking get_llm_response inside the
tool-iteration loop, set response_model=None (use get_llm_response(...,
response_model=None, ...)), detect tool-call lists with _is_tool_call_list and
handle them via _handle_native_tool_calls as before, and only after the tool
loop completes make one final async get_llm_response call with
response_model=self.response_model to apply the schema; ensure you update the
calls to get_llm_response and preserve existing arguments (callbacks, printer,
executor_context, verbose) and flow control.

---

Nitpick comments:
In `@lib/crewai/src/crewai/agents/crew_agent_executor.py`:
- Around line 509-520: The post-loop extraction call using get_llm_response
(when self.response_model is not None) is missing the same rate-limit guard used
earlier; before invoking get_llm_response in crew_agent_executor.py add a call
to enforce_rpm_limit(...) with the same parameters used for the main LLM call so
this second request is counted and throttled too (keep the call in the same
conditional branch that checks self.response_model and use the same
executor/context values such as self.llm, self.task, and self.agent to mirror
the existing rate-limit invocation).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 567bc033-6475-4457-a308-ed19ced60766

📥 Commits

Reviewing files that changed from the base of the PR and between e4a91cd and c3ac7b9.

📒 Files selected for processing (1)
  • lib/crewai/src/crewai/agents/crew_agent_executor.py

@HrushiYadav
Copy link
Copy Markdown
Author

Good catch - applied the same fix to _ainvoke_loop_native_tools (async path) in 73a6a23. Both sync and async loops now suppress response_model during tool iterations and apply it in a final extraction call once the tool loop completes.

…roviders

Since v1.9.0, response_model is passed alongside tools on every iteration
of _invoke_loop_native_tools. Providers such as Gemini and Anthropic treat
response_format (structured output) as higher priority than tools, so the
LLM returns a structured response immediately without making any tool calls.

Fix: pass response_model=None while tools are active. Once the tool loop
completes and the LLM returns a text answer, make one final call without
tools to apply the structured schema if response_model is set.

This restores pre-v1.9.0 behavior where structured output was applied only
in post-processing after tool calls were exhausted.

Fixes crewAIInc#5472
_ainvoke_loop_native_tools had the same regression as the sync path:
response_model was passed alongside tools on every async iteration,
causing non-OpenAI providers to skip tool calls.

Apply the same two-phase fix: suppress response_model=None during tool
iterations, then make one final async call without tools once the loop
produces a text answer.
@HrushiYadav HrushiYadav force-pushed the fix/native-tools-suppress-response-model branch from 73a6a23 to a0cd32b Compare May 14, 2026 05:37
…utor

Same regression as CrewAgentExecutor: call_llm_native_tools passed
response_model alongside tools on every iteration. Providers like Gemini
and Anthropic skip tool calls when response_format is set, so tools were
silently never called.

Applied the same two-phase fix: suppress response_model=None during tool
iterations, apply it in a final extraction call once the loop produces a
text answer. Consistent with how the Plan-and-Execute path already handles
it in the synthesis step.
@HrushiYadav
Copy link
Copy Markdown
Author

Also extended the fix to AgentExecutor (experimental) in 6e0cdc9 - call_llm_native_tools had the same issue. The new executor already applies response_model only at the synthesis step in its Plan-and-Execute path, so this brings the native tools path in line with that existing pattern.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
lib/crewai/src/crewai/experimental/agent_executor.py (1)

1362-1370: 💤 Low value

Consider extracting BaseModel finalization to a helper method.

The block at lines 1362-1370 duplicates the logic from lines 1337-1345. Both handle the same pattern: wrapping a BaseModel in AgentFinish, invoking callbacks, appending to state, and returning the route.

♻️ Proposed refactor to reduce duplication
+    def _finalize_native_basemodel_answer(self, answer: BaseModel) -> Literal["native_finished", "todo_satisfied"]:
+        """Wrap a BaseModel answer in AgentFinish and route appropriately."""
+        self.state.current_answer = AgentFinish(
+            thought="",
+            output=answer,
+            text=answer.model_dump_json(),
+        )
+        self._invoke_step_callback(self.state.current_answer)
+        self._append_message_to_state(answer.model_dump_json())
+        return self._route_finish_with_todos("native_finished")

Then replace both occurrences (lines 1337-1345 and 1362-1370) with:

if isinstance(answer, BaseModel):
    return self._finalize_native_basemodel_answer(answer)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/crewai/src/crewai/experimental/agent_executor.py` around lines 1362 -
1370, Extract the duplicated BaseModel finalization logic into a helper method
(e.g., _finalize_native_basemodel_answer) that takes the BaseModel instance and
performs: create AgentFinish with thought="", output=answer,
text=answer.model_dump_json(), set self.state.current_answer, call
self._invoke_step_callback(self.state.current_answer), call
self._append_message_to_state(answer.model_dump_json()), and return
self._route_finish_with_todos("native_finished"); then replace both duplicated
blocks that check isinstance(answer, BaseModel) with a single call return
self._finalize_native_basemodel_answer(answer).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@lib/crewai/src/crewai/experimental/agent_executor.py`:
- Around line 1362-1370: Extract the duplicated BaseModel finalization logic
into a helper method (e.g., _finalize_native_basemodel_answer) that takes the
BaseModel instance and performs: create AgentFinish with thought="",
output=answer, text=answer.model_dump_json(), set self.state.current_answer,
call self._invoke_step_callback(self.state.current_answer), call
self._append_message_to_state(answer.model_dump_json()), and return
self._route_finish_with_todos("native_finished"); then replace both duplicated
blocks that check isinstance(answer, BaseModel) with a single call return
self._finalize_native_basemodel_answer(answer).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: dd8c0381-999a-4ded-9dc5-f56c6de5fc96

📥 Commits

Reviewing files that changed from the base of the PR and between a0cd32b and 6e0cdc9.

📒 Files selected for processing (1)
  • lib/crewai/src/crewai/experimental/agent_executor.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] output_pydantic / response_model leaks into agent tool-calling loop, causing tools to be skipped on non-OpenAI LLMs

1 participant