Codebase-Memory: Tree-Sitter-Based Knowledge Graphs for LLM Code Exploration via MCP

AuthorsMartin Vogel, Falk Meyer-Eschenbach, Severin Kohler et al.

2026

TL;DR

Codebase-Memory uses a Tree-Sitter-based knowledge graph in SQLite exposed via MCP tools to answer code-structure questions with 10× fewer tokens at 83% quality vs. 92% for a file-exploration agent.

SharePost on XLinkedIn

Read our summary here, or open the publisher PDF on the next tab.

THE PROBLEM

LLM agents burn 10× tokens on unstructured code exploration

LLM coding agents repeatedly read files and grep search, consuming hundreds of thousands of tokens per task without structural understanding.

This makes structural questions like impact analysis expensive and slow, because agents lack call graphs, dependency chains, and module boundaries in a queryable form.

HOW IT WORKS

Codebase-Memory — Tree-Sitter Knowledge Graphs via MCP

Codebase-Memory centers on a three-stage pipeline with Parse stage, Build stage, Serve stage, plus a FunctionRegistry and Louvain communities stored in SQLite and exposed via the MCP tool interface.

You can think of Codebase-Memory as turning a codebase into a card catalog: files become graph nodes, relationships become edges, and MCP tools are the librarian answering structural questions.

This design lets Codebase-Memory answer structural queries like impact analysis or hub detection directly from the graph, instead of re-reading raw files into a limited context window.

DIAGRAM

Multi-pass pipeline from source files to knowledge graph

This diagram shows how Codebase-Memory runs the six pipeline phases to turn source files into a Tree-Sitter-based knowledge graph.

DIAGRAM

Head-to-head evaluation: MCP Agent vs Explorer Agent

This diagram shows how Codebase-Memory is evaluated by comparing an MCP Agent using MCP tools against an Explorer Agent using file reading and grep.

PROCESS

How Codebase-Memory Handles a Repository Query Session

  1. 01

    Parse stage

    Codebase-Memory walks Tree-Sitter ASTs across 66 languages, extracting definitions, call sites, imports, and traits into the Parse stage.

  2. 02

    Build stage

    Codebase-Memory runs the multi-pass pipeline in the Build stage, using parallel worker pools and the FunctionRegistry to assemble nodes and edges.

  3. 03

    Serve stage

    Codebase-Memory flushes the graph into SQLite, computes Louvain communities, and exposes everything via the MCP tool interface in the Serve stage.

  4. 04

    Incremental synchronization

    Codebase-Memory watches files, uses XXH3 hashes to detect changes, and re-runs the relevant pipeline phases to keep the knowledge graph fresh.

KEY CONTRIBUTIONS

Key Contributions

  • 01

    Knowledge-graph architecture for code

    Codebase-Memory combines Tree-Sitter parsing across 66 languages, a multi-phase Build stage, FunctionRegistry, and Louvain communities into a single SQLite knowledge graph with zero external dependencies.

  • 02

    MCP-based structural tool interface

    Codebase-Memory exposes 14 typed MCP tools, including search_graph, trace_call_path, query_graph, and get_architecture, enabling sub-millisecond structural queries for any MCP-compatible agent.

  • 03

    Head-to-head evaluation across 31 languages

    Codebase-Memory achieves 0.83 answer quality versus 0.92 for a file-exploration agent, with ten times fewer tokens and 2.1 times fewer tool calls on 31 real-world repositories.

RESULTS

By the Numbers

Quality score

0.83

-0.09 vs Explorer Agent quality 0.92

Tool calls per question

2.3

-2.5 vs Explorer Agent 4.8 tool calls

Tokens per question

1000

-9000 vs Explorer Agent ∼10,000 tokens

Query latency

1

<1 ms vs Explorer Agent 10–30 s latency

On a benchmark of 12 question categories across 31 languages, Codebase-Memory delivers 0.83 quality versus 0.92 for the Explorer Agent while cutting tokens and tool calls dramatically. This shows that Codebase-Memory can trade a small quality gap for 10× token savings and >100× faster structural queries.

BENCHMARK

By the Numbers

On a benchmark of 12 question categories across 31 languages, Codebase-Memory delivers 0.83 quality versus 0.92 for the Explorer Agent while cutting tokens and tool calls dramatically. This shows that Codebase-Memory can trade a small quality gap for 10× token savings and >100× faster structural queries.

BENCHMARK

Head-to-head comparison: MCP Agent vs Explorer Agent

Average quality score for structural code questions across 31 languages.

KEY INSIGHT

The Counterintuitive Finding

Codebase-Memory reaches 0.83 quality versus 0.92 for an Explorer Agent while using ten times fewer tokens and 2.1 times fewer tool calls.

This is surprising because many assume structural indexing would require more complexity and overhead, yet Codebase-Memory actually simplifies agent behavior and still matches or exceeds the Explorer Agent on 19 of 31 languages for graph-native queries.

WHY IT MATTERS

What this unlocks for the field

Codebase-Memory makes structural code retrieval a first-class capability, letting agents query call graphs, communities, and impact analysis directly via MCP tools.

This enables builders to create coding agents that start from a rich structural map instead of blind file crawling, making large, polyglot repositories and even Linux-kernel-scale projects practical to explore interactively.

~13 min read← Back to papers

Related papers

Memory Architecture

Breaking the KV Cache Bottleneck: Fan Duality Model Achieves O(1) Decode Memory with Superior Associative Recall

Yasong Fan

· 2026

Fan Duality Model (FDM) uses the Fan Operator, Local-Global Cache, Freeze-Scan Training, and Holographic Reference Beam Decoding to separate wave-like compression from particle-like associative recall. On WikiText-103, Fan Duality Model (FDM) reaches 64.9 perplexity with Freeze-Scan and 62.79 with holographic decoding, while achieving 0.966 MQAR accuracy compared to Transformer at 0.606.

Questions about this paper?

Paper: Codebase-Memory: Tree-Sitter-Based Knowledge Graphs for LLM Code Exploration via MCP

Answers use this explainer on Memory Papers.

Checking…