Feat+Fix: Let Tools returning multimodal data#5804
Conversation
Introduces MultimodalToolResult as a public API for tool authors who need to return file attachments (images, audio, video, PDFs) alongside a text observation. Adds a files field to the internal ToolResult so the multimodal payload can travel through the execution pipeline unchanged. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… remove add_image special case _format_result now intercepts MultimodalToolResult and BaseFile values before the str() fallback, stashing the extracted FileInput dict in self._result_files so callers can propagate it into ToolResult.files. The add_image name-based branch in use()/ause() is removed; all tools now go through the same path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…iles After use()/ause() returns, read _result_files from the ToolUsage instance and forward it as files= on the returned ToolResult so the multimodal payload reaches the agent executor unchanged. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Returns MultimodalToolResult(text, files={image: ImageFile(url)}) so the
image travels through the standard files pipeline instead of a raw dict.
Falls back to text-only when crewai-files is not installed.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…pecial handling - Remove the add_image name-based branch from _handle_agent_action; all tools now use the same path. - Add _extract_native_tool_result() to detect MultimodalToolResult/BaseFile in the NativeTools path and separate text from files. - Carry files through execution_result dict into _append_tool_result_and _check_finality, where they are stored on the tool message's files field so _process_message_files() converts them to provider content blocks. - Add _inject_files_into_last_user_message() helper and call it from both sync and async ReAct loops after _append_message(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Mirror the crew_agent_executor changes: drop the add_image name-based branch from _handle_agent_action and add _inject_files_into_last_user _message so MultimodalToolResult files reach the LLM via the files pipeline. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
_execute_text_tool_with_events now returns ToolResult instead of str so the caller can attach tool_result.files to the observation LLMMessage. _process_message_files() then converts them to provider content blocks. VISION_IMAGE sentinel handling is preserved for backward compatibility. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ImageFile is a Pydantic model and does not accept positional arguments; positional call raised TypeError at runtime. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Covers: - MultimodalToolResult and ToolResult.files dataclass fields - ToolUsage._extract_multimodal_files with MultimodalToolResult, BaseFile, and plain values - ToolUsage._format_result integration - AddImageTool._run returning MultimodalToolResult with correct ImageFile Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Documents the MultimodalToolResult API so library users can build custom tools that pass images, audio, video, and PDFs to the LLM. Covers file sources (bytes, path, URL), supported file types, and the single-file shorthand. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…comments ruff format reformatted tool_types.py, tool_usage.py, and crew_agent_executor.py. Removed four # type: ignore[...] comments that mypy reported as unused after the TypedDict was updated to include the files key. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ions Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Hmm, how to add |
📝 WalkthroughWalkthroughAdds a MultimodalToolResult type and threads file attachments returned by tools through ToolUsage, tool execution helpers, and executors so file payloads are attached to message history and available to the LLM. ChangesMultimodal File Support
Sequence Diagram(s)sequenceDiagram
participant Tool as Custom Tool
participant Usage as ToolUsage
participant Extract as _extract_multimodal_files
participant Utils as tool_utils
participant Executor as CrewAgentExecutor
participant Inject as _inject_files_into_last_user_message
Tool->>Usage: _run returns MultimodalToolResult(text, files={...})
Usage->>Extract: _format_result(multimodal_result)
Extract->>Usage: return text, populate _result_files
Usage->>Utils: tool execution completes, _result_files observed
Utils->>Executor: return ToolResult(result=text, files=_result_files)
Executor->>Executor: append tool message including files field
Executor->>Inject: tool_result.files not empty
Inject->>Executor: merge files into last role=user message
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@lib/crewai/src/crewai/agents/step_executor.py`:
- Around line 315-321: The loop in StepExecutor (around variables
last_tool_result, messages, answer_str, and obs_msg in step_executor.py) ignores
tool_result.result_as_answer and therefore allows a further LLM turn; update the
logic so that when tool_result.result_as_answer is True you append the assistant
message with answer_str (as already done), skip building/appending the
observation message and immediately return or break to end execution (i.e., do
not call _build_observation_message or proceed to another LLM turn); preserve
adding tool_result.files only for non-final steps as needed and ensure
last_tool_result is still set when returning.
In `@lib/crewai/src/crewai/tools/tool_usage.py`:
- Around line 655-663: Reset the ToolUsage instance's attachment state at the
start of result formatting: in _format_result, set self._result_files = [] (or
appropriate empty container) immediately when the method begins so stale
attachments from prior calls are cleared; then proceed to call
_extract_multimodal_files(result) and the existing branches. Do the same
clearing at the start of any other formatting branches referenced around the
same area (the alternative formatting paths around lines 669–693) so every code
path begins with a fresh self._result_files before calling
_extract_multimodal_files or _remember_format.
In `@lib/crewai/tests/tools/test_multimodal_tool_result.py`:
- Around line 181-196: The test lacks a final assertion to validate that a plain
second call to _format_result does not clear _result_files; update
test_result_files_cleared_on_plain_result to capture the files after the first
_format_result call (e.g., previous_files = tu._result_files.copy()) then after
calling tu._format_result("plain follow-up") assert that tu._result_files ==
previous_files (or at least that tu._result_files is not empty), referencing the
test name test_result_files_cleared_on_plain_result and the symbols
_format_result and _result_files so the intent is explicitly checked.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: bab10ed3-01f4-4233-99f8-669115e938f1
📒 Files selected for processing (13)
docs/ar/learn/multimodal-agents.mdxdocs/en/learn/multimodal-agents.mdxdocs/ko/learn/multimodal-agents.mdxdocs/pt-BR/learn/multimodal-agents.mdxlib/crewai/src/crewai/agents/crew_agent_executor.pylib/crewai/src/crewai/agents/step_executor.pylib/crewai/src/crewai/experimental/agent_executor.pylib/crewai/src/crewai/tools/__init__.pylib/crewai/src/crewai/tools/agent_tools/add_image_tool.pylib/crewai/src/crewai/tools/tool_types.pylib/crewai/src/crewai/tools/tool_usage.pylib/crewai/src/crewai/utilities/tool_utils.pylib/crewai/tests/tools/test_multimodal_tool_result.py
…xecutor When a tool signals result_as_answer=True the loop was still appending an observation message and proceeding to another LLM turn. Now we append the assistant message and immediately return the tool result, consistent with how crew_agent_executor handles the same flag. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
If _format_result is called multiple times on the same ToolUsage instance (e.g. during error-handling retries), stale files from a prior multimodal result would persist and be attached to the wrong observation. Clearing _result_files before each call ensures every code path begins clean. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…sult call The previous test had no assertion after the second call, leaving the behaviour undocumented. After the _format_result reset fix, a plain call must produce an empty _result_files; assert that explicitly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
lib/crewai/src/crewai/tools/tool_usage.py (1)
127-137:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReset
_result_filesat the start of each tool execution path.On Line 127 and Line 162,
use()/ause()can return early on errors before_format_result()runs, so previous call attachments may leak into the nextToolResult.files.Suggested fix
def use( self, calling: ToolCalling | InstructorToolCalling, tool_string: str ) -> str: + self._result_files = {} if isinstance(calling, ToolUsageError): error = calling.message if self.agent and self.agent.verbose: PRINTER.print(content=f"\n\n{error}\n", color="red") if self.task: @@ async def ause( self, calling: ToolCalling | InstructorToolCalling, tool_string: str ) -> str: + self._result_files = {} """Execute a tool asynchronously.Also applies to: 162-169
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lib/crewai/src/crewai/tools/tool_usage.py` around lines 127 - 137, Reset the tool result attachments at the start of each execution path to avoid leaking files between calls: in the Tool class methods use and ause, set self._result_files = [] (or call the existing reset helper if present) immediately at the top of each method before any early-return branches (including the isinstance(calling, ToolUsageError) branch and any other error/early-return checks) so that when those branches return (e.g., returning error strings) the next ToolResult will not include files from the previous invocation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@lib/crewai/src/crewai/tools/tool_usage.py`:
- Around line 674-679: The try/except currently bundles both imports so a
missing optional package (`BaseFile` from crewai_files.core.types) causes early
return and prevents handling of MultimodalToolResult; update the import block so
that `from crewai.tools.tool_types import MultimodalToolResult` is imported
unconditionally (or in its own try) outside the try that imports `BaseFile`, and
only the `BaseFile` import is wrapped in a try/except that sets a local sentinel
(e.g., BaseFile = None) on ImportError; then adjust the subsequent logic that
inspects tool results to use isinstance(result, MultimodalToolResult) to return
result.text and only use BaseFile-related handling when BaseFile is not None.
Ensure references to MultimodalToolResult and BaseFile in the surrounding
function/class are updated accordingly.
---
Outside diff comments:
In `@lib/crewai/src/crewai/tools/tool_usage.py`:
- Around line 127-137: Reset the tool result attachments at the start of each
execution path to avoid leaking files between calls: in the Tool class methods
use and ause, set self._result_files = [] (or call the existing reset helper if
present) immediately at the top of each method before any early-return branches
(including the isinstance(calling, ToolUsageError) branch and any other
error/early-return checks) so that when those branches return (e.g., returning
error strings) the next ToolResult will not include files from the previous
invocation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 93cdfa73-cf53-4579-9962-b447d540b401
📒 Files selected for processing (3)
lib/crewai/src/crewai/agents/step_executor.pylib/crewai/src/crewai/tools/tool_usage.pylib/crewai/tests/tools/test_multimodal_tool_result.py
🚧 Files skipped from review as they are similar to previous changes (2)
- lib/crewai/src/crewai/agents/step_executor.py
- lib/crewai/tests/tools/test_multimodal_tool_result.py
| try: | ||
| from crewai_files.core.types import BaseFile | ||
|
|
||
| from crewai.tools.tool_types import MultimodalToolResult | ||
| except ImportError: | ||
| return None |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
fd tool_types.py
rg -n -C3 'class MultimodalToolResult|FileInput|ImportError|crewai_files' lib/crewai/src/crewai/tools/tool_types.py
rg -n -C3 '_extract_multimodal_files|MultimodalToolResult|crewai_files.core.types' lib/crewai/src/crewai/tools/tool_usage.py
rg -n -C2 'MultimodalToolResult|crewai_files|ImportError' lib/crewai/tests/toolsRepository: crewAIInc/crewAI
Length of output: 9361
🏁 Script executed:
sed -n '670,700p' lib/crewai/src/crewai/tools/tool_usage.pyRepository: crewAIInc/crewAI
Length of output: 1230
Decouple MultimodalToolResult extraction from optional crewai_files import.
Lines 674-679 couple two imports in a single try block: BaseFile (from optional crewai_files) and MultimodalToolResult (from the crewai package). When crewai_files is missing, the entire method returns None, preventing MultimodalToolResult handling and causing fallback to str(result) instead of returning result.text.
The fix moves MultimodalToolResult outside the try block and handles the optional BaseFile separately, allowing proper MultimodalToolResult handling regardless of whether crewai_files is installed.
Suggested fix
- try:
- from crewai_files.core.types import BaseFile
-
- from crewai.tools.tool_types import MultimodalToolResult
- except ImportError:
- return None
+ from crewai.tools.tool_types import MultimodalToolResult
+ try:
+ from crewai_files.core.types import BaseFile
+ except ImportError:
+ BaseFile = None # type: ignore[assignment]
if isinstance(result, MultimodalToolResult):
self._result_files = dict(result.files)
return result.text
- if isinstance(result, BaseFile):
+ if BaseFile is not None and isinstance(result, BaseFile):🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@lib/crewai/src/crewai/tools/tool_usage.py` around lines 674 - 679, The
try/except currently bundles both imports so a missing optional package
(`BaseFile` from crewai_files.core.types) causes early return and prevents
handling of MultimodalToolResult; update the import block so that `from
crewai.tools.tool_types import MultimodalToolResult` is imported unconditionally
(or in its own try) outside the try that imports `BaseFile`, and only the
`BaseFile` import is wrapped in a try/except that sets a local sentinel (e.g.,
BaseFile = None) on ImportError; then adjust the subsequent logic that inspects
tool results to use isinstance(result, MultimodalToolResult) to return
result.text and only use BaseFile-related handling when BaseFile is not None.
Ensure references to MultimodalToolResult and BaseFile in the surrounding
function/class are updated accordingly.
Summary
This PR introduces a first-class public API for tool authors to return multimodal content (images, audio, video, PDFs) from any
BaseTool, and wires up the full pipeline so those files reach the LLM in the correct provider format.Previously,
AddImageToolwas the only path for injecting images into the conversation, implemented via a fragile name-based special case that calledstr()on the result dict — meaning it silently produced a Pythonreprstring instead of actual image content. Any other tool returning aFileInputwould have its multimodal structure destroyed by_format_result()'s unconditionalstr()conversion.Before — tool authors had no supported way to return multimodal data:
After — any tool can return multimodal content:
Changes
New public API (
tools/tool_types.py,tools/__init__.py)MultimodalToolResultdataclass:text: str,files: dict[str, FileInput],result_as_answer: boolfiles: dict[str, FileInput]field to the internalToolResultdataclassMultimodalToolResultfromcrewai.toolsCore pipeline (
tools/tool_usage.py)self._result_files: dict[str, Any] = {}side-channel toToolUsage_extract_multimodal_files()helper: interceptsMultimodalToolResultand bareBaseFileobjects beforestr()conversion, extracts files into_result_files_format_result()to use the new helper; non-multimodal results unchangedadd_image-named special-case branches fromuse()andause()Propagation (
utilities/tool_utils.py)execute_tool_and_check_finalityandaexecute_tool_and_check_finalitynow readtool_usage._result_filesafter execution and pass them asfiles=to the returnedToolResultAgent executors
agents/crew_agent_executor.py: removed add_image branch from_handle_agent_action(); added_extract_native_tool_result()static method for the NativeTools path; added_inject_files_into_last_user_message()helper; both ReAct loops now injecttool_result.filesinto the last user message after each tool callexperimental/agent_executor.py: same removals and additions mirrored for the experimental executoragents/step_executor.py:_execute_text_tool_with_events()return type changed fromstrtoToolResult; caller attachesfilesto the observation message when presentBug fix (
tools/agent_tools/add_image_tool.py)ImageFile(image_url)→ImageFile(source=image_url):ImageFileis a Pydantic model and does not accept positional arguments; the previous call raisedTypeErrorat runtimeTests (
tests/tools/test_multimodal_tool_result.py)MultimodalToolResultandToolResult.filesdataclasses,_extract_multimodal_files()with all input types,_format_result()integration, andAddImageTool._run()Documentation (
docs/*/learn/multimodal-agents.mdx)MultimodalToolResult, supported file types, file source formats, and the single-file shorthandHow it works
The existing
filespipeline inBaseLLM._process_message_files()already convertsdict[str, FileInput]on any message into provider-specific content blocks (OpenAI image_url, Anthropic image source, etc.). This PR closes the gap between tool execution and that pipeline:Notes
stror any other type are unaffectedfiles={}(files are not cached); this is acceptableresult_as_answer=Truewith files: the final text answer is returned as-is; including files inAgentFinish.outputis out of scope for this PRcrewai-filesis an optional dependency; all multimodal code paths are guarded withtry/except ImportErrorfallbacksSummary by CodeRabbit
Release Notes
New Features
Documentation
Tests
Additional Information
This tackles issue #5758, and I bet this PR is better than auto-fix by #5789