-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Description
Proposal: GNAP as a coordination layer for parallel GraphRAG indexing across AI agents
GraphRAG's indexing pipeline is the core innovation — transforming unstructured text into knowledge graphs via LLMs. The pipeline is expensive (as you note in your docs) and naturally parallelizable: entity extraction, community detection, and summarization can run on different corpus chunks simultaneously.
GNAP (Git-Native Agent Protocol) provides a zero-infrastructure coordination layer for distributed GraphRAG indexing: a git repo as a task board with board/todo/ → board/doing/ → board/done/.
Applied to GraphRAG's indexing pipeline:
When indexing a large corpus, multiple GraphRAG agents can process document chunks in parallel:
board/todo/index-corpus-chunk-001.md ← Coordinator splits large corpus
board/todo/index-corpus-chunk-002.md
board/todo/index-corpus-chunk-003.md
board/doing/index-corpus-chunk-001.md ← GraphRAG agent 1 claims + indexes
board/doing/index-corpus-chunk-002.md ← GraphRAG agent 2 claims + indexes
board/done/index-corpus-chunk-001.md ← Parquet files path + entity count committed
A merge agent then reads all board/done/ files to combine the knowledge graphs. This is particularly valuable for GraphRAG's community-level summarization — different agents can handle different community hierarchies in parallel.
GNAP also provides the audit trail for expensive LLM operations — you can see exactly which indexing steps ran, on which chunks, with what results, making it easier to resume interrupted indexing runs without re-processing completed chunks.