Category

Graph Memory

Graph-based memory for AI agents — knowledge graphs, hypergraph retrieval, and relational memory structures.

8 papers

BenchmarkBenchmarkLong-Term Memory

AgenticAI-DialogGen: Topic-Guided Conversation Generation for Fine-Tuning and Evaluating Short- and Long-Term Memories of LLMs

Manoj Madushanka Perera, Adnan Mahmood et al.

· 2026

AgenticAI-DialogGen chains ChatPreprocessor, KnowledgeExtractor, TopicAnalyzer, KnowledgeGraphBuilder, PersonaGenerator, DuelingChat Agent, ConversationValidator, ConversationRefiner, QAGeneration, and PostProcessing to turn raw multi-session chats into topic-guided, persona-grounded conversations with explicit short- and long-term memories. On the TGC / KG memory QA benchmark, Mistral-7B fine-tuned within AgenticAI-DialogGen achieves 87.36 F1, compared to GPT-4’s 83.77 F1 in a zero-shot setting on the same task.

Benchmark

Codebase-Memory: Tree-Sitter-Based Knowledge Graphs for LLM Code Exploration via MCP

Martin Vogel, Falk Meyer-Eschenbach et al.

· 2026

Codebase-Memory parses repositories with a multi-pass pipeline using the Parse stage, Build stage, Serve stage, FunctionRegistry, Louvain communities, and MCP tool interface to build a persistent SQLite knowledge graph. On a 31-language benchmark, Codebase-Memory reaches 0.83 quality versus 0.92 for an Explorer Agent while using ten times fewer tokens and 2.1 times fewer tool calls.

BenchmarkAgent MemoryMemory Architecture

GAM: Hierarchical Graph-based Agentic Memory for LLM Agents

Zhaofen Wu, Hanrong Zhang et al.

· 2026

GAM builds a Hierarchical Graph Memory Architecture with a global Topic Associative Network, local Event Progression Graphs, State-Based Memory Consolidation, and Graph-Guided Multi-Factor Retrieval to decouple encoding from consolidation. On LoCoMo with Qwen2.5-7B, GAM attains an Average F1 of 40.00 compared to Mem0’s 35.38, and on LongDialQA with Qwen2.5-7B, GAM reaches 12.55 F1 vs MemoryOS at 6.76.

RAGBenchmarkBenchmarkMemory Architecture

Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling

Chulun Zhou, Chunkang Zhang et al.

· 2025

HGMEM represents working memory as a hypergraph with Hypergraph-based Memory Storage, Adaptive Memory-based Evidence Retrieval, and Dynamic Memory Evolving to build high-order correlations across entities and facts. On Prelude long narrative understanding, HGMEM with GPT-4o achieves 73.81% accuracy compared to 72.22% for HippoRAG v2, while also reaching 69.74 comprehensiveness on Longbench generative sense-making QA.

BenchmarkBenchmarkAgent Memory

Memoria: A Scalable Agentic Memory Framework for Personalized Conversational AI

Samarth Sarin, Lovepreet Singh et al.

· 2025

Memoria augments LLM chats with structured conversation logging, dynamic user persona via KG, session level memory for real time context, and seamless retrieval for context aware responses to provide persistent, interpretable memory. On LongMemEvals single-session-user and knowledge-update subsets, Memoria reaches 87.1% and 80.8% accuracy respectively, surpassing A-Mem (OpenAI) while using much shorter prompts.

BenchmarkAgent Memory

MemoriesDB: A Temporal-Semantic-Relational Database for Long-Term Agent Memory / Modeling Experience as a Graph of Temporal-Semantic Surfaces

Joel Ward

· 2025

MemoriesDB stores each Memory Record, Edges and Relations, and the Temporal Semantic Stack inside PostgreSQL with pgvector, exposing unified temporal–semantic–relational queries. MemoriesDB’s main result is a working implementation that demonstrates scalable time-bounded recall and hybrid semantic–structural queries on commodity SQL infrastructure without specialized vector or graph engines.

Benchmark

SGMem: Sentence Graph Memory for Long-Term Conversational Agents

Yaxiong Wu, Yongyue Zhang et al.

· 2025

SGMem organizes long conversations via SGMem Construction and Management, SGMem Usage, sentence level graphs, and multi hop retrieval over sessions, rounds, turns, summaries, facts, and insights. SGMem achieves 0.700 Accuracy (Top 5) on LongMemEval and 0.526 on LoCoMo, beating the RAG-SMFI baseline at 0.676 and 0.510 respectively.