Fix #5805: Add metadata_filter support to Knowledge querying pipeline by devin-ai-integration[bot] · Pull Request #5806 · crewAIInc/crewAI

devin-ai-integration · 2026-05-14T08:16:38Z

Summary

Closes #5805. The lower layers (KnowledgeStorage.search, BaseRecord.metadata, BaseKnowledgeSource.metadata) already support metadata, but 4 gaps prevented users from actually using metadata-based filtering end-to-end. This PR fills all 4 gaps:

New KnowledgeConfig class (src/crewai/knowledge/knowledge_config.py) — Pydantic model with results_limit, score_threshold, and metadata_filter fields.
Knowledge.query() — Now accepts and forwards filter and score_threshold to KnowledgeStorage.search().
Crew.query_knowledge() — Now accepts and forwards limit, filter, and score_threshold to Knowledge.query().
BaseKnowledgeSource._save_documents() — Now passes self.metadata to storage.save() (was silently dropped before, even had a # Currently unused comment).
Agent — New knowledge_config: Optional[KnowledgeConfig] field. execute_task() uses it when querying both agent-level and crew-level knowledge.

Usage:

from crewai.knowledge.knowledge_config import KnowledgeConfig

config = KnowledgeConfig(
    results_limit=10,
    score_threshold=0.5,
    metadata_filter={"task": "translation"},
)
agent = Agent(
    knowledge_sources=[...],
    knowledge_config=config,
)

All changes are fully backward-compatible — existing code without knowledge_config or metadata_filter behaves identically.

Review & Testing Checklist for Human

Verify KnowledgeConfig defaults match the existing hardcoded defaults (limit=3, score_threshold=0.35, metadata_filter=None)
Verify that BaseKnowledgeSource._save_documents() now persists self.metadata into the vector store — test with a real ChromaDB/Qdrant backend to confirm metadata appears in stored documents
End-to-end test: create an Agent with knowledge_config=KnowledgeConfig(metadata_filter={"task": "translation"}), add knowledge sources with matching metadata, and confirm only filtered results are returned during task execution
Confirm no regressions on existing knowledge tests (uv run pytest tests/knowledge/ -vv)

Notes

10 new tests added covering all 4 gaps: KnowledgeConfig construction, Knowledge.query() forwarding, BaseKnowledgeSource._save_documents() metadata passing, and Crew.query_knowledge() forwarding.

Link to Devin session: https://app.devin.ai/sessions/a015e268b09b43d688b173e0e95e2599

Summary by CodeRabbit

New Features
- Added knowledge configuration settings to control query result limits, score filtering thresholds, and metadata filtering.
- Agents can now apply custom knowledge-query parameters.
Tests
- Expanded test coverage for knowledge configuration and parameter forwarding behaviors.

- Add KnowledgeConfig class with results_limit, score_threshold, metadata_filter - Add knowledge_config field to Agent - Update Knowledge.query() to forward filter and score_threshold to storage - Update Crew.query_knowledge() to accept and forward filter params - Fix BaseKnowledgeSource._save_documents() to pass self.metadata to storage - Wire Agent.execute_task() to use knowledge_config for both agent and crew queries - Add 10 tests covering all changes Co-Authored-By: João <joao@crewai.com>

devin-ai-integration · 2026-05-14T08:16:40Z

Prompt hidden (unlisted session)

devin-ai-integration · 2026-05-14T08:16:41Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

coderabbitai · 2026-05-14T08:16:52Z

📝 Walkthrough

Walkthrough

This PR implements configurable knowledge-query parameters by introducing KnowledgeConfig with metadata filtering, score threshold, and result-limit settings. The configuration flows from Agent through Crew to Knowledge, which forwards parameters to storage. Additionally, source metadata is now persisted when saving documents.

Changes

Knowledge Configuration and Parameter Propagation

Layer / File(s)	Summary
KnowledgeConfig contract definition `src/crewai/knowledge/knowledge_config.py`	Introduces `KnowledgeConfig` Pydantic model with three optional configuration fields: `results_limit` (default 3), `score_threshold` (default 0.35), and `metadata_filter` (default None).
Knowledge layer parameter forwarding `src/crewai/knowledge/knowledge.py`	`Knowledge.query()` signature expanded to accept optional `filter` and `score_threshold` parameters with defaults, which are forwarded to `self.storage.search()`.
Crew and Agent configuration integration `src/crewai/crew.py`, `src/crewai/agent.py`	`Crew.query_knowledge()` exposes `limit`, `filter`, and `score_threshold` parameters and forwards them to `Knowledge.query()`. Agent adds optional `knowledge_config` field and derives query kwargs from it when querying both personal knowledge and crew knowledge.
Document metadata persistence `src/crewai/knowledge/source/base_knowledge_source.py`	`BaseKnowledgeSource._save_documents()` now passes `metadata` argument to `storage.save()` when source metadata is present, enabling persistent metadata in the vector store.
Comprehensive test suite `tests/knowledge/knowledge_test.py`	Added four test classes with 11 test methods: `TestKnowledgeConfig` validates configuration defaults and partial overrides; `TestKnowledgeQueryForwardsParams` confirms parameter forwarding through storage; `TestBaseKnowledgeSourceSavesMetadata` verifies metadata persistence; `TestCrewQueryKnowledgeForwardsParams` validates end-to-end configuration propagation from Crew through Knowledge to storage.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Through config fields, the knowledge flows so free,
From Agent down through Crew to Storage's spree,
Metadata now persists with careful grace,
Query parameters find their rightful place! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 26.32% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The pull request title accurately and specifically describes the main change: adding metadata_filter support to the Knowledge querying pipeline, with clear reference to the issue being fixed.
Linked Issues check	✅ Passed	All four gaps identified in issue `#5805` are addressed: KnowledgeConfig now includes metadata_filter field, Knowledge.query() forwards filter/score_threshold, Crew.query_knowledge() accepts and forwards these parameters, and BaseKnowledgeSource._save_documents() passes self.metadata to storage.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to the requirements in issue `#5805`; no out-of-scope modifications detected beyond the metadata filter support implementation.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch devin/1778746332-knowledge-metadata-filter

⚔️ Resolve merge conflicts

Resolve merge conflict in branch devin/1778746332-knowledge-metadata-filter

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

 from pathlib import Path
-from typing import List, Union
-from unittest.mock import patch
+from typing import Any, Dict, List, Optional, Union


-from typing import List, Union
-from unittest.mock import patch
+from typing import Any, Dict, List, Optional, Union
+from unittest.mock import MagicMock, call, patch


coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/crewai/knowledge/knowledge_config.py`:
- Around line 15-23: The results_limit and score_threshold fields accept invalid
numbers; update their definitions in knowledge_config.py to enforce bounds
(e.g., set results_limit to have ge=1 and set score_threshold to be >0 and <=1)
or add pydantic validators on the KnowledgeQueryConfig model that raise
ValueError when results_limit < 1 or when score_threshold <= 0 or > 1,
referencing the results_limit and score_threshold symbols so invalid query
params are rejected before being sent to storage.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 43701e7f-cee4-439a-aecf-6c7821e8175f

📥 Commits

Reviewing files that changed from the base of the PR and between c36827b and 4f72839.

📒 Files selected for processing (6)

src/crewai/agent.py
src/crewai/crew.py
src/crewai/knowledge/knowledge.py
src/crewai/knowledge/knowledge_config.py
src/crewai/knowledge/source/base_knowledge_source.py
tests/knowledge/knowledge_test.py

coderabbitai · 2026-05-14T08:19:45Z

+    results_limit: int = Field(
+        default=3,
+        description="Maximum number of results to return from a knowledge query.",
+    )
+    score_threshold: float = Field(
+        default=0.35,
+        description="Minimum relevance score for results.",
+    )
+    metadata_filter: Optional[Dict[str, Any]] = Field(


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate bounds for query config values.

results_limit and score_threshold currently accept invalid values (e.g., 0, negatives), which can propagate bad query params to storage.

Proposed fix

class KnowledgeConfig(BaseModel): @@ results_limit: int = Field( default=3, + ge=1, description="Maximum number of results to return from a knowledge query.", ) score_threshold: float = Field( default=0.35, + ge=0.0, description="Minimum relevance score for results.", )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/crewai/knowledge/knowledge_config.py` around lines 15 - 23, The results_limit and score_threshold fields accept invalid numbers; update their definitions in knowledge_config.py to enforce bounds (e.g., set results_limit to have ge=1 and set score_threshold to be >0 and <=1) or add pydantic validators on the KnowledgeQueryConfig model that raise ValueError when results_limit < 1 or when score_threshold <= 0 or > 1, referencing the results_limit and score_threshold symbols so invalid query params are rejected before being sent to storage.

github-code-quality Bot found potential problems May 14, 2026

View reviewed changes

coderabbitai Bot requested changes May 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #5805: Add metadata_filter support to Knowledge querying pipeline#5806

Fix #5805: Add metadata_filter support to Knowledge querying pipeline#5806
devin-ai-integration[bot] wants to merge 1 commit into
mainfrom
devin/1778746332-knowledge-metadata-filter

devin-ai-integration Bot commented May 14, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

devin-ai-integration Bot commented May 14, 2026

Uh oh!

devin-ai-integration Bot commented May 14, 2026

Uh oh!

coderabbitai Bot commented May 14, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Conversation

devin-ai-integration Bot commented May 14, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Review & Testing Checklist for Human

Notes

Summary by CodeRabbit

Uh oh!

devin-ai-integration Bot commented May 14, 2026

Uh oh!

devin-ai-integration Bot commented May 14, 2026

🤖 Devin AI Engineer

Uh oh!

coderabbitai Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

devin-ai-integration Bot commented May 14, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 14, 2026 •

edited

Loading