Skip to content

GNAP: git-native coordination for distributed GraphRAG indexing pipelines #2282

@ori-cofounder

Description

@ori-cofounder

Proposal: GNAP as a coordination layer for parallel GraphRAG indexing across AI agents

GraphRAG's indexing pipeline is the core innovation — transforming unstructured text into knowledge graphs via LLMs. The pipeline is expensive (as you note in your docs) and naturally parallelizable: entity extraction, community detection, and summarization can run on different corpus chunks simultaneously.

GNAP (Git-Native Agent Protocol) provides a zero-infrastructure coordination layer for distributed GraphRAG indexing: a git repo as a task board with board/todo/board/doing/board/done/.

Applied to GraphRAG's indexing pipeline:

When indexing a large corpus, multiple GraphRAG agents can process document chunks in parallel:

board/todo/index-corpus-chunk-001.md    ← Coordinator splits large corpus
board/todo/index-corpus-chunk-002.md
board/todo/index-corpus-chunk-003.md

board/doing/index-corpus-chunk-001.md   ← GraphRAG agent 1 claims + indexes
board/doing/index-corpus-chunk-002.md   ← GraphRAG agent 2 claims + indexes
board/done/index-corpus-chunk-001.md    ← Parquet files path + entity count committed

A merge agent then reads all board/done/ files to combine the knowledge graphs. This is particularly valuable for GraphRAG's community-level summarization — different agents can handle different community hierarchies in parallel.

GNAP also provides the audit trail for expensive LLM operations — you can see exactly which indexing steps ran, on which chunks, with what results, making it easier to resume interrupted indexing runs without re-processing completed chunks.

Spec: https://github.com/farol-team/gnap

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions