Skip to content

Fix #5805: Add metadata_filter support to Knowledge querying pipeline#5806

Open
devin-ai-integration[bot] wants to merge 1 commit into
mainfrom
devin/1778746332-knowledge-metadata-filter
Open

Fix #5805: Add metadata_filter support to Knowledge querying pipeline#5806
devin-ai-integration[bot] wants to merge 1 commit into
mainfrom
devin/1778746332-knowledge-metadata-filter

Conversation

@devin-ai-integration
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot commented May 14, 2026

Summary

Closes #5805. The lower layers (KnowledgeStorage.search, BaseRecord.metadata, BaseKnowledgeSource.metadata) already support metadata, but 4 gaps prevented users from actually using metadata-based filtering end-to-end. This PR fills all 4 gaps:

  1. New KnowledgeConfig class (src/crewai/knowledge/knowledge_config.py) — Pydantic model with results_limit, score_threshold, and metadata_filter fields.
  2. Knowledge.query() — Now accepts and forwards filter and score_threshold to KnowledgeStorage.search().
  3. Crew.query_knowledge() — Now accepts and forwards limit, filter, and score_threshold to Knowledge.query().
  4. BaseKnowledgeSource._save_documents() — Now passes self.metadata to storage.save() (was silently dropped before, even had a # Currently unused comment).
  5. Agent — New knowledge_config: Optional[KnowledgeConfig] field. execute_task() uses it when querying both agent-level and crew-level knowledge.

Usage:

from crewai.knowledge.knowledge_config import KnowledgeConfig

config = KnowledgeConfig(
    results_limit=10,
    score_threshold=0.5,
    metadata_filter={"task": "translation"},
)
agent = Agent(
    knowledge_sources=[...],
    knowledge_config=config,
)

All changes are fully backward-compatible — existing code without knowledge_config or metadata_filter behaves identically.

Review & Testing Checklist for Human

  • Verify KnowledgeConfig defaults match the existing hardcoded defaults (limit=3, score_threshold=0.35, metadata_filter=None)
  • Verify that BaseKnowledgeSource._save_documents() now persists self.metadata into the vector store — test with a real ChromaDB/Qdrant backend to confirm metadata appears in stored documents
  • End-to-end test: create an Agent with knowledge_config=KnowledgeConfig(metadata_filter={"task": "translation"}), add knowledge sources with matching metadata, and confirm only filtered results are returned during task execution
  • Confirm no regressions on existing knowledge tests (uv run pytest tests/knowledge/ -vv)

Notes

10 new tests added covering all 4 gaps: KnowledgeConfig construction, Knowledge.query() forwarding, BaseKnowledgeSource._save_documents() metadata passing, and Crew.query_knowledge() forwarding.

Link to Devin session: https://app.devin.ai/sessions/a015e268b09b43d688b173e0e95e2599

Summary by CodeRabbit

  • New Features

    • Added knowledge configuration settings to control query result limits, score filtering thresholds, and metadata filtering.
    • Agents can now apply custom knowledge-query parameters.
  • Tests

    • Expanded test coverage for knowledge configuration and parameter forwarding behaviors.

Review Change Stack

- Add KnowledgeConfig class with results_limit, score_threshold, metadata_filter
- Add knowledge_config field to Agent
- Update Knowledge.query() to forward filter and score_threshold to storage
- Update Crew.query_knowledge() to accept and forward filter params
- Fix BaseKnowledgeSource._save_documents() to pass self.metadata to storage
- Wire Agent.execute_task() to use knowledge_config for both agent and crew queries
- Add 10 tests covering all changes

Co-Authored-By: João <joao@crewai.com>
@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

Prompt hidden (unlisted session)

@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 14, 2026

📝 Walkthrough

Walkthrough

This PR implements configurable knowledge-query parameters by introducing KnowledgeConfig with metadata filtering, score threshold, and result-limit settings. The configuration flows from Agent through Crew to Knowledge, which forwards parameters to storage. Additionally, source metadata is now persisted when saving documents.

Changes

Knowledge Configuration and Parameter Propagation

Layer / File(s) Summary
KnowledgeConfig contract definition
src/crewai/knowledge/knowledge_config.py
Introduces KnowledgeConfig Pydantic model with three optional configuration fields: results_limit (default 3), score_threshold (default 0.35), and metadata_filter (default None).
Knowledge layer parameter forwarding
src/crewai/knowledge/knowledge.py
Knowledge.query() signature expanded to accept optional filter and score_threshold parameters with defaults, which are forwarded to self.storage.search().
Crew and Agent configuration integration
src/crewai/crew.py, src/crewai/agent.py
Crew.query_knowledge() exposes limit, filter, and score_threshold parameters and forwards them to Knowledge.query(). Agent adds optional knowledge_config field and derives query kwargs from it when querying both personal knowledge and crew knowledge.
Document metadata persistence
src/crewai/knowledge/source/base_knowledge_source.py
BaseKnowledgeSource._save_documents() now passes metadata argument to storage.save() when source metadata is present, enabling persistent metadata in the vector store.
Comprehensive test suite
tests/knowledge/knowledge_test.py
Added four test classes with 11 test methods: TestKnowledgeConfig validates configuration defaults and partial overrides; TestKnowledgeQueryForwardsParams confirms parameter forwarding through storage; TestBaseKnowledgeSourceSavesMetadata verifies metadata persistence; TestCrewQueryKnowledgeForwardsParams validates end-to-end configuration propagation from Crew through Knowledge to storage.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Through config fields, the knowledge flows so free,
From Agent down through Crew to Storage's spree,
Metadata now persists with careful grace,
Query parameters find their rightful place!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 26.32% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately and specifically describes the main change: adding metadata_filter support to the Knowledge querying pipeline, with clear reference to the issue being fixed.
Linked Issues check ✅ Passed All four gaps identified in issue #5805 are addressed: KnowledgeConfig now includes metadata_filter field, Knowledge.query() forwards filter/score_threshold, Crew.query_knowledge() accepts and forwards these parameters, and BaseKnowledgeSource._save_documents() passes self.metadata to storage.
Out of Scope Changes check ✅ Passed All changes are directly scoped to the requirements in issue #5805; no out-of-scope modifications detected beyond the metadata filter support implementation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch devin/1778746332-knowledge-metadata-filter
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch devin/1778746332-knowledge-metadata-filter

Comment @coderabbitai help to get the list of available commands and usage tips.

from pathlib import Path
from typing import List, Union
from unittest.mock import patch
from typing import Any, Dict, List, Optional, Union
from typing import List, Union
from unittest.mock import patch
from typing import Any, Dict, List, Optional, Union
from unittest.mock import MagicMock, call, patch
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/crewai/knowledge/knowledge_config.py`:
- Around line 15-23: The results_limit and score_threshold fields accept invalid
numbers; update their definitions in knowledge_config.py to enforce bounds
(e.g., set results_limit to have ge=1 and set score_threshold to be >0 and <=1)
or add pydantic validators on the KnowledgeQueryConfig model that raise
ValueError when results_limit < 1 or when score_threshold <= 0 or > 1,
referencing the results_limit and score_threshold symbols so invalid query
params are rejected before being sent to storage.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 43701e7f-cee4-439a-aecf-6c7821e8175f

📥 Commits

Reviewing files that changed from the base of the PR and between c36827b and 4f72839.

📒 Files selected for processing (6)
  • src/crewai/agent.py
  • src/crewai/crew.py
  • src/crewai/knowledge/knowledge.py
  • src/crewai/knowledge/knowledge_config.py
  • src/crewai/knowledge/source/base_knowledge_source.py
  • tests/knowledge/knowledge_test.py

Comment on lines +15 to +23
results_limit: int = Field(
default=3,
description="Maximum number of results to return from a knowledge query.",
)
score_threshold: float = Field(
default=0.35,
description="Minimum relevance score for results.",
)
metadata_filter: Optional[Dict[str, Any]] = Field(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate bounds for query config values.

results_limit and score_threshold currently accept invalid values (e.g., 0, negatives), which can propagate bad query params to storage.

Proposed fix
 class KnowledgeConfig(BaseModel):
@@
     results_limit: int = Field(
         default=3,
+        ge=1,
         description="Maximum number of results to return from a knowledge query.",
     )
     score_threshold: float = Field(
         default=0.35,
+        ge=0.0,
         description="Minimum relevance score for results.",
     )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/crewai/knowledge/knowledge_config.py` around lines 15 - 23, The
results_limit and score_threshold fields accept invalid numbers; update their
definitions in knowledge_config.py to enforce bounds (e.g., set results_limit to
have ge=1 and set score_threshold to be >0 and <=1) or add pydantic validators
on the KnowledgeQueryConfig model that raise ValueError when results_limit < 1
or when score_threshold <= 0 or > 1, referencing the results_limit and
score_threshold symbols so invalid query params are rejected before being sent
to storage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Knowledge metadata supported in storage but not configurable via KnowledgeConfig / Agent

0 participants