feat: implement robust LLM JSON parsing to handle markdown and filler… by divye-joshi · Pull Request #40 · INCF/knowledge-space-agent

divye-joshi · 2026-01-28T15:01:34Z

Description

This PR implements robust JSON parsing for LLM-generated responses within the backend agents. Currently, the system relies on json.loads() directly on the raw response text. If the LLM returns Markdown code blocks (e.g., ```json ... ```) or conversational filler, the json.loads() call fails with a JSONDecodeError, potentially crashing the chat session.

The Issue: Fragile Parsing

The existing implementation in call_gemini_for_keywords and call_gemini_detect_intents assumes the LLM response is a perfectly formatted JSON string.

Examples of Failure Modes:

Markdown Formatting: The LLM wraps the JSON in ```json tags.
Conversational Filler: The LLM responds with: "Sure, here are the keywords: {"keywords": ["brain"]}".
Leading/Trailing Whitespace: Unexpected characters outside the JSON object structure.

The Fix: Robust Extraction

I introduced a private helper function _parse_llm_json that utilizes regular expressions and string slicing to extract the JSON object from a "noisy" response before parsing it.

Changes Made

backend/agents.py:
- Implemented _parse_llm_json(text: str) -> dict to handle Markdown blocks and conversational text.
- Updated call_gemini_for_keywords to use the new robust parser.
- Updated call_gemini_detect_intents to use the new robust parser.

Verification

Verified with a test script covering:

Clean JSON strings.
JSON wrapped in Markdown code blocks.
JSON embedded within conversational sentences.

… text

QuantumByte-01 · 2026-03-12T12:19:37Z

The _parse_llm_json utility is genuinely useful and handles markdown code fences and fallback extraction correctly — this is a real problem worth solving. However this PR also bundles rerank_results_using_metadata and expand_query functions that appear copied from other open PRs (#38, #39), and the reranking score math has the same issue (publication_year / 10000 as a score boost is numerically incorrect). Please open a focused PR with only the JSON parsing fix and drop the unrelated functions.

feat: implement robust LLM JSON parsing to handle markdown and filler…

83fccce

… text

QuantumByte-01 closed this Mar 12, 2026

zohaib-7035 mentioned this pull request Mar 16, 2026

feat: Add normalized metadata reranking and query expansion (Fixes #40) #94

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement robust LLM JSON parsing to handle markdown and filler…#40

feat: implement robust LLM JSON parsing to handle markdown and filler…#40
divye-joshi wants to merge 1 commit intoINCF:mainfrom
divye-joshi:fix/robust-json-parsing

divye-joshi commented Jan 28, 2026

Uh oh!

QuantumByte-01 commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

divye-joshi commented Jan 28, 2026

Description

The Issue: Fragile Parsing

The Fix: Robust Extraction

Changes Made

Verification

Uh oh!

QuantumByte-01 commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants