fix: suppress response_model during native tool loop for non-OpenAI providers by HrushiYadav · Pull Request #5767 · crewAIInc/crewAI

HrushiYadav · 2026-05-11T06:25:34Z

Since v1.9.0, response_model is passed alongside tools on every iteration of the native tool loop in both CrewAgentExecutor and the new AgentExecutor. Providers like Gemini and Anthropic treat response_format as higher priority than tools, so the LLM skips tool calls entirely and returns structured output on the first call.

Changed both _invoke_loop_native_tools (CrewAgentExecutor) and call_llm_native_tools (AgentExecutor) to pass response_model=None during tool iterations, then make one final extraction call without tools once the loop produces a text answer. The new AgentExecutor already uses this pattern correctly in its Plan-and-Execute synthesis step - this brings the native tools path in line with it. Also applied the same fix to the async path in CrewAgentExecutor.

Summary by CodeRabbit

Bug Fixes
- Improved tool invocation reliability when structured output is enabled. The system now correctly prioritizes tool calls over structured formatting and applies the schema after tool execution completes for more predictable responses.

coderabbitai · 2026-05-11T06:25:55Z

📝 Walkthrough

Walkthrough

The PR defers application of response_model until after native tool loops: calls while tools are active pass response_model=None; once the tool loop yields final text and response_model is set, a final LLM call (no tools) applies the schema and the executor constructs the AgentFinish.

Changes

Response Model Post-Tool Application

Layer / File(s)	Summary
Tool Loop Response Model Suppression `lib/crewai/src/crewai/agents/crew_agent_executor.py`, `lib/crewai/src/crewai/experimental/agent_executor.py`	Comments added explaining suppression; LLM calls during native-tools loops now pass `response_model=None` instead of `self.response_model` (sync + async, core and experimental).
Post-Tool Schema Application `lib/crewai/src/crewai/agents/crew_agent_executor.py`, `lib/crewai/src/crewai/experimental/agent_executor.py`	When the agent finishes with a string and `response_model` is configured, perform a final LLM call with no tools to apply the schema; if extraction yields a `BaseModel`, serialize to JSON text and produce `AgentFinish`, otherwise use the raw string. Callback/history updates are applied accordingly (sync + async).

Sequence Diagram(s)

sequenceDiagram
  participant AgentExecutor
  participant LLM
  participant Tools

  AgentExecutor->>LLM: get_llm_response(messages, tools=Tools, response_model=None)
  alt LLM returns tool call
    LLM->>AgentExecutor: tool_call
    AgentExecutor->>Tools: run tool_call
    Tools-->>AgentExecutor: tool_result
    AgentExecutor->>LLM: get_llm_response(updated_messages, tools=Tools, response_model=None)
  else LLM returns final string
    LLM-->>AgentExecutor: final_text
  end
  alt response_model is configured
    AgentExecutor->>LLM: get_llm_response(messages_with_final_text, tools=None, response_model=Model)
    LLM-->>AgentExecutor: BaseModel or string
    AgentExecutor-->>AgentExecutor: wrap into AgentFinish (JSON text if BaseModel else raw string)
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

size/M

Suggested reviewers

greysonlalonde

Poem

🐰 I hopped where tools must run, not stall,

schemas waited till the final call.
Loops danced free with tools in flight,
then structured words made the answer right. ✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the primary change: suppressing response_model during native tool loops for non-OpenAI providers to fix tool skipping.
Linked Issues check	✅ Passed	The PR fully addresses the linked issue `#5472` by implementing the recommended solution: suppressing response_model during tool-calling iterations and applying it post-loop via a final extraction call.
Out of Scope Changes check	✅ Passed	All code changes are scoped to fixing the response_model handling in native tool loops for both sync and async paths, directly addressing the linked issue objective.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

lib/crewai/src/crewai/agents/crew_agent_executor.py (1)

475-520: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Async native-tools flow still has the original regression

The sync path now suppresses response_model during tool iterations, but CrewAgentExecutor._ainvoke_loop_native_tools still sends response_model=self.response_model (Line 1327). Async agents can still skip tools on non-OpenAI providers, so this fix is only partial.

Suggested parity fix for async native-tools loop

@@ async def _ainvoke_loop_native_tools(self) -> AgentFinish:
-                answer = await aget_llm_response(
+                answer = await aget_llm_response(
                     llm=cast("BaseLLM", self.llm),
                     messages=self.messages,
                     callbacks=self.callbacks,
                     printer=PRINTER,
                     tools=openai_tools,
                     available_functions=None,
                     from_task=self.task,
                     from_agent=self.agent,
-                    response_model=self.response_model,
+                    response_model=None,
                     executor_context=self,
                     verbose=self.agent.verbose,
                 )
@@
                 if isinstance(answer, str):
+                    if self.response_model is not None:
+                        enforce_rpm_limit(self.request_within_rpm_limit)
+                        answer = await aget_llm_response(
+                            llm=cast("BaseLLM", self.llm),
+                            messages=self.messages,
+                            callbacks=self.callbacks,
+                            printer=PRINTER,
+                            from_task=self.task,
+                            from_agent=self.agent,
+                            response_model=self.response_model,
+                            executor_context=self,
+                            verbose=self.agent.verbose,
+                        )
+
+                    if isinstance(answer, BaseModel):
+                        output_json = answer.model_dump_json()
+                        formatted_answer = AgentFinish(
+                            thought="",
+                            output=answer,
+                            text=output_json,
+                        )
+                        await self._ainvoke_step_callback(formatted_answer)
+                        self._append_message(output_json)
+                        self._show_logs(formatted_answer)
+                        return formatted_answer
+
+                    answer_str = answer if isinstance(answer, str) else str(answer)
                     formatted_answer = AgentFinish(
                         thought="",
-                        output=answer,
-                        text=answer,
+                        output=answer_str,
+                        text=answer_str,
                     )
                     await self._ainvoke_step_callback(formatted_answer)
-                    self._append_message(answer)
+                    self._append_message(answer_str)
                     self._show_logs(formatted_answer)
                     return formatted_answer

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/crewai/src/crewai/agents/crew_agent_executor.py` around lines 475 - 520,
The async native-tools loop still passes self.response_model into
get_llm_response, so change _ainvoke_loop_native_tools to mirror the sync path:
when invoking get_llm_response inside the tool-iteration loop, set
response_model=None (use get_llm_response(..., response_model=None, ...)),
detect tool-call lists with _is_tool_call_list and handle them via
_handle_native_tool_calls as before, and only after the tool loop completes make
one final async get_llm_response call with response_model=self.response_model to
apply the schema; ensure you update the calls to get_llm_response and preserve
existing arguments (callbacks, printer, executor_context, verbose) and flow
control.

🧹 Nitpick comments (1)

lib/crewai/src/crewai/agents/crew_agent_executor.py (1)

509-520: ⚡ Quick win

Rate-limit the post-loop extraction call too

Line 510 adds a second LLM call in the same iteration, but there is no second enforce_rpm_limit(...) before it. That can bypass request throttling accounting.

Suggested guard before the extraction call

                     if self.response_model is not None:
+                        enforce_rpm_limit(self.request_within_rpm_limit)
                         answer = get_llm_response(
                             llm=cast("BaseLLM", self.llm),
                             messages=self.messages,
                             callbacks=self.callbacks,
                             printer=PRINTER,

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/crewai/src/crewai/agents/crew_agent_executor.py` around lines 509 - 520,
The post-loop extraction call using get_llm_response (when self.response_model
is not None) is missing the same rate-limit guard used earlier; before invoking
get_llm_response in crew_agent_executor.py add a call to enforce_rpm_limit(...)
with the same parameters used for the main LLM call so this second request is
counted and throttled too (keep the call in the same conditional branch that
checks self.response_model and use the same executor/context values such as
self.llm, self.task, and self.agent to mirror the existing rate-limit
invocation).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@lib/crewai/src/crewai/agents/crew_agent_executor.py`:
- Around line 475-520: The async native-tools loop still passes
self.response_model into get_llm_response, so change _ainvoke_loop_native_tools
to mirror the sync path: when invoking get_llm_response inside the
tool-iteration loop, set response_model=None (use get_llm_response(...,
response_model=None, ...)), detect tool-call lists with _is_tool_call_list and
handle them via _handle_native_tool_calls as before, and only after the tool
loop completes make one final async get_llm_response call with
response_model=self.response_model to apply the schema; ensure you update the
calls to get_llm_response and preserve existing arguments (callbacks, printer,
executor_context, verbose) and flow control.

---

Nitpick comments:
In `@lib/crewai/src/crewai/agents/crew_agent_executor.py`:
- Around line 509-520: The post-loop extraction call using get_llm_response
(when self.response_model is not None) is missing the same rate-limit guard used
earlier; before invoking get_llm_response in crew_agent_executor.py add a call
to enforce_rpm_limit(...) with the same parameters used for the main LLM call so
this second request is counted and throttled too (keep the call in the same
conditional branch that checks self.response_model and use the same
executor/context values such as self.llm, self.task, and self.agent to mirror
the existing rate-limit invocation).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 567bc033-6475-4457-a308-ed19ced60766

📥 Commits

Reviewing files that changed from the base of the PR and between e4a91cd and c3ac7b9.

📒 Files selected for processing (1)

lib/crewai/src/crewai/agents/crew_agent_executor.py

HrushiYadav · 2026-05-11T06:57:06Z

Good catch - applied the same fix to _ainvoke_loop_native_tools (async path) in 73a6a23. Both sync and async loops now suppress response_model during tool iterations and apply it in a final extraction call once the tool loop completes.

…roviders Since v1.9.0, response_model is passed alongside tools on every iteration of _invoke_loop_native_tools. Providers such as Gemini and Anthropic treat response_format (structured output) as higher priority than tools, so the LLM returns a structured response immediately without making any tool calls. Fix: pass response_model=None while tools are active. Once the tool loop completes and the LLM returns a text answer, make one final call without tools to apply the structured schema if response_model is set. This restores pre-v1.9.0 behavior where structured output was applied only in post-processing after tool calls were exhausted. Fixes crewAIInc#5472

_ainvoke_loop_native_tools had the same regression as the sync path: response_model was passed alongside tools on every async iteration, causing non-OpenAI providers to skip tool calls. Apply the same two-phase fix: suppress response_model=None during tool iterations, then make one final async call without tools once the loop produces a text answer.

…utor Same regression as CrewAgentExecutor: call_llm_native_tools passed response_model alongside tools on every iteration. Providers like Gemini and Anthropic skip tool calls when response_format is set, so tools were silently never called. Applied the same two-phase fix: suppress response_model=None during tool iterations, apply it in a final extraction call once the loop produces a text answer. Consistent with how the Plan-and-Execute path already handles it in the synthesis step.

HrushiYadav · 2026-05-14T05:43:24Z

Also extended the fix to AgentExecutor (experimental) in 6e0cdc9 - call_llm_native_tools had the same issue. The new executor already applies response_model only at the synthesis step in its Plan-and-Execute path, so this brings the native tools path in line with that existing pattern.

coderabbitai

🧹 Nitpick comments (1)

lib/crewai/src/crewai/experimental/agent_executor.py (1)

1362-1370: 💤 Low value

Consider extracting BaseModel finalization to a helper method.

The block at lines 1362-1370 duplicates the logic from lines 1337-1345. Both handle the same pattern: wrapping a BaseModel in AgentFinish, invoking callbacks, appending to state, and returning the route.

♻️ Proposed refactor to reduce duplication

+    def _finalize_native_basemodel_answer(self, answer: BaseModel) -> Literal["native_finished", "todo_satisfied"]:
+        """Wrap a BaseModel answer in AgentFinish and route appropriately."""
+        self.state.current_answer = AgentFinish(
+            thought="",
+            output=answer,
+            text=answer.model_dump_json(),
+        )
+        self._invoke_step_callback(self.state.current_answer)
+        self._append_message_to_state(answer.model_dump_json())
+        return self._route_finish_with_todos("native_finished")

Then replace both occurrences (lines 1337-1345 and 1362-1370) with:

if isinstance(answer, BaseModel):
    return self._finalize_native_basemodel_answer(answer)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/crewai/src/crewai/experimental/agent_executor.py` around lines 1362 -
1370, Extract the duplicated BaseModel finalization logic into a helper method
(e.g., _finalize_native_basemodel_answer) that takes the BaseModel instance and
performs: create AgentFinish with thought="", output=answer,
text=answer.model_dump_json(), set self.state.current_answer, call
self._invoke_step_callback(self.state.current_answer), call
self._append_message_to_state(answer.model_dump_json()), and return
self._route_finish_with_todos("native_finished"); then replace both duplicated
blocks that check isinstance(answer, BaseModel) with a single call return
self._finalize_native_basemodel_answer(answer).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@lib/crewai/src/crewai/experimental/agent_executor.py`:
- Around line 1362-1370: Extract the duplicated BaseModel finalization logic
into a helper method (e.g., _finalize_native_basemodel_answer) that takes the
BaseModel instance and performs: create AgentFinish with thought="",
output=answer, text=answer.model_dump_json(), set self.state.current_answer,
call self._invoke_step_callback(self.state.current_answer), call
self._append_message_to_state(answer.model_dump_json()), and return
self._route_finish_with_todos("native_finished"); then replace both duplicated
blocks that check isinstance(answer, BaseModel) with a single call return
self._finalize_native_basemodel_answer(answer).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: dd8c0381-999a-4ded-9dc5-f56c6de5fc96

📥 Commits

Reviewing files that changed from the base of the PR and between a0cd32b and 6e0cdc9.

📒 Files selected for processing (1)

lib/crewai/src/crewai/experimental/agent_executor.py

coderabbitai Bot reviewed May 11, 2026

View reviewed changes

coderabbitai Bot approved these changes May 11, 2026

View reviewed changes

HrushiYadav added 2 commits May 14, 2026 11:07

HrushiYadav force-pushed the fix/native-tools-suppress-response-model branch from 73a6a23 to a0cd32b Compare May 14, 2026 05:37

coderabbitai Bot reviewed May 14, 2026

View reviewed changes

mahasarabesh mentioned this pull request May 15, 2026

fix: stop output_pydantic from leaking into tool-calling loop (#5472) #5821

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: suppress response_model during native tool loop for non-OpenAI providers#5767

fix: suppress response_model during native tool loop for non-OpenAI providers#5767
HrushiYadav wants to merge 3 commits into
crewAIInc:mainfrom
HrushiYadav:fix/native-tools-suppress-response-model

HrushiYadav commented May 11, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 11, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

HrushiYadav commented May 11, 2026

Uh oh!

HrushiYadav commented May 14, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

HrushiYadav commented May 11, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

HrushiYadav commented May 11, 2026

Uh oh!

HrushiYadav commented May 14, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

HrushiYadav commented May 11, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 11, 2026 •

edited

Loading