Problem
When evaluating tool trajectories, the current TrajectoryEvaluator always compares both tool names and arguments (actual.args == expected.args). This applies to all three match types: EXACT, IN_ORDER, and ANY_ORDER.
There are common scenarios where users care only that the correct tools were called (and optionally in the right order), but not about the exact arguments passed. For example:
- Tool arguments are non-deterministic (e.g., generated queries, timestamps)
- The eval focuses on verifying the agent's tool selection strategy, not argument correctness
- Arguments are complex objects where minor formatting differences cause false negatives
Proposed Solution
Add an ignore_args boolean option to ToolTrajectoryCriterion that, when set to True, skips argument comparison entirely and only matches on tool names.
Usage example
{
"tool_trajectory_avg_score": {
"threshold": 0.8,
"match_type": "IN_ORDER",
"ignore_args": true
}
}
Scope of changes
ToolTrajectoryCriterion — add ignore_args: bool = False field
TrajectoryEvaluator — read the flag and conditionally skip arg comparison in all three match methods (EXACT, IN_ORDER, ANY_ORDER)
- Tests — add coverage for
ignore_args=True across all match types, including failure cases and dict-based config