Fix #5805: Add metadata_filter support to Knowledge querying pipeline#5806
Fix #5805: Add metadata_filter support to Knowledge querying pipeline#5806devin-ai-integration[bot] wants to merge 1 commit into
Conversation
- Add KnowledgeConfig class with results_limit, score_threshold, metadata_filter - Add knowledge_config field to Agent - Update Knowledge.query() to forward filter and score_threshold to storage - Update Crew.query_knowledge() to accept and forward filter params - Fix BaseKnowledgeSource._save_documents() to pass self.metadata to storage - Wire Agent.execute_task() to use knowledge_config for both agent and crew queries - Add 10 tests covering all changes Co-Authored-By: João <joao@crewai.com>
|
Prompt hidden (unlisted session) |
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
📝 WalkthroughWalkthroughThis PR implements configurable knowledge-query parameters by introducing ChangesKnowledge Configuration and Parameter Propagation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts
Comment |
| from pathlib import Path | ||
| from typing import List, Union | ||
| from unittest.mock import patch | ||
| from typing import Any, Dict, List, Optional, Union |
| from typing import List, Union | ||
| from unittest.mock import patch | ||
| from typing import Any, Dict, List, Optional, Union | ||
| from unittest.mock import MagicMock, call, patch |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/crewai/knowledge/knowledge_config.py`:
- Around line 15-23: The results_limit and score_threshold fields accept invalid
numbers; update their definitions in knowledge_config.py to enforce bounds
(e.g., set results_limit to have ge=1 and set score_threshold to be >0 and <=1)
or add pydantic validators on the KnowledgeQueryConfig model that raise
ValueError when results_limit < 1 or when score_threshold <= 0 or > 1,
referencing the results_limit and score_threshold symbols so invalid query
params are rejected before being sent to storage.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 43701e7f-cee4-439a-aecf-6c7821e8175f
📒 Files selected for processing (6)
src/crewai/agent.pysrc/crewai/crew.pysrc/crewai/knowledge/knowledge.pysrc/crewai/knowledge/knowledge_config.pysrc/crewai/knowledge/source/base_knowledge_source.pytests/knowledge/knowledge_test.py
| results_limit: int = Field( | ||
| default=3, | ||
| description="Maximum number of results to return from a knowledge query.", | ||
| ) | ||
| score_threshold: float = Field( | ||
| default=0.35, | ||
| description="Minimum relevance score for results.", | ||
| ) | ||
| metadata_filter: Optional[Dict[str, Any]] = Field( |
There was a problem hiding this comment.
Validate bounds for query config values.
results_limit and score_threshold currently accept invalid values (e.g., 0, negatives), which can propagate bad query params to storage.
Proposed fix
class KnowledgeConfig(BaseModel):
@@
results_limit: int = Field(
default=3,
+ ge=1,
description="Maximum number of results to return from a knowledge query.",
)
score_threshold: float = Field(
default=0.35,
+ ge=0.0,
description="Minimum relevance score for results.",
)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/crewai/knowledge/knowledge_config.py` around lines 15 - 23, The
results_limit and score_threshold fields accept invalid numbers; update their
definitions in knowledge_config.py to enforce bounds (e.g., set results_limit to
have ge=1 and set score_threshold to be >0 and <=1) or add pydantic validators
on the KnowledgeQueryConfig model that raise ValueError when results_limit < 1
or when score_threshold <= 0 or > 1, referencing the results_limit and
score_threshold symbols so invalid query params are rejected before being sent
to storage.
Summary
Closes #5805. The lower layers (
KnowledgeStorage.search,BaseRecord.metadata,BaseKnowledgeSource.metadata) already support metadata, but 4 gaps prevented users from actually using metadata-based filtering end-to-end. This PR fills all 4 gaps:KnowledgeConfigclass (src/crewai/knowledge/knowledge_config.py) — Pydantic model withresults_limit,score_threshold, andmetadata_filterfields.Knowledge.query()— Now accepts and forwardsfilterandscore_thresholdtoKnowledgeStorage.search().Crew.query_knowledge()— Now accepts and forwardslimit,filter, andscore_thresholdtoKnowledge.query().BaseKnowledgeSource._save_documents()— Now passesself.metadatatostorage.save()(was silently dropped before, even had a# Currently unusedcomment).Agent— Newknowledge_config: Optional[KnowledgeConfig]field.execute_task()uses it when querying both agent-level and crew-level knowledge.Usage:
All changes are fully backward-compatible — existing code without
knowledge_configormetadata_filterbehaves identically.Review & Testing Checklist for Human
KnowledgeConfigdefaults match the existing hardcoded defaults (limit=3,score_threshold=0.35,metadata_filter=None)BaseKnowledgeSource._save_documents()now persistsself.metadatainto the vector store — test with a real ChromaDB/Qdrant backend to confirm metadata appears in stored documentsknowledge_config=KnowledgeConfig(metadata_filter={"task": "translation"}), add knowledge sources with matching metadata, and confirm only filtered results are returned during task executionuv run pytest tests/knowledge/ -vv)Notes
10 new tests added covering all 4 gaps:
KnowledgeConfigconstruction,Knowledge.query()forwarding,BaseKnowledgeSource._save_documents()metadata passing, andCrew.query_knowledge()forwarding.Link to Devin session: https://app.devin.ai/sessions/a015e268b09b43d688b173e0e95e2599
Summary by CodeRabbit
New Features
Tests