AdaMem: Adaptive User-Centric Memory for Long-Horizon Dialogue Agents

AuthorsShannan Yan, Jingchen Ni, Leqi Zheng et al.

2026

TL;DR

AdaMem uses participant-specific working, episodic, persona, and graph memories with question-conditioned retrieval to reach 44.65 F1 on LoCoMo, +2.89 over LangMem.

SharePost on X LinkedIn

Read our summary here, or open the publisher PDF on the next tab.

THE PROBLEM

Long-horizon agents miss user-centric evidence and temporal links

Existing memory systems often rely too heavily on semantic similarity, missing crucial evidence for user-centric understanding and long-term reasoning.

They also store related experiences as isolated fragments with static granularity, causing temporal and causal incoherence and degrading multi-session dialogue agents.

HOW IT WORKS

AdaMem — Adaptive user-centric structured memories

AdaMem’s core mechanism combines Working Memory, Episodic Memory, Persona Memory, Graph Memory, a Memory Agent, and a Research Agent into a unified user-centric pipeline.

You can think of AdaMem like a computer with RAM for recent turns, a disk of structured episodes, a user profile file, and a graph index linking everything.

This design lets AdaMem adapt retrieval routes and granularity per question, recovering cross-turn evidence that a plain context window or static chunking cannot.

DIAGRAM

Question-conditioned retrieval and graph expansion flow

This diagram shows how AdaMem resolves the target participant, plans a question-conditioned route, and fuses baseline and graph retrieval.

DIAGRAM

Evaluation and ablation pipeline for AdaMem

This diagram shows how AdaMem is evaluated on LoCoMo and PERSONAMEM, including component ablations and backbone scaling.

PROCESS

How AdaMem Handles a Long-horizon Dialogue Question

01
Memory Construction
AdaMem uses the Memory Agent to parse each utterance into normalized records and update Working Memory, Episodic Memory, Persona Memory, and Graph Memory.
02
Question-Conditioned Retrieval
AdaMem resolves the target participant, builds a route plan, and runs baseline and graph retrieval over Episodic Memory and Graph Memory.
03
Evidence Fusion
AdaMem combines baseline, graph, recency, and factual signals into a unified evidence set that the Research Agent summarizes.
04
Response Generation
The Working Agent conditions on the research summary and Persona Memory to generate a concise, grounded answer.

KEY CONTRIBUTIONS

Key Contributions

01
Adaptive user-centric memory framework
AdaMem organizes dialogue into Working Memory, Episodic Memory, Persona Memory, and Graph Memory, enabling participant-specific bundles that support long-horizon reasoning across LoCoMo’s 35-session, 9,000-token histories.
02
Question-conditioned retrieval and response pipeline
AdaMem introduces target-aware route planning, relation-aware graph expansion, and a unified fusion mechanism coordinated by the Research Agent and Working Agent.
03
State-of-the-art long-horizon performance
AdaMem reaches 44.65 F1 on LoCoMo with GPT-4.1-mini and 63.25 accuracy on PERSONAMEM, with a +23.4 F1 gain on temporal questions and +27.3 relative improvement on generalize to new scenarios.

RESULTS

By the Numbers

Overall F1

44.65

+2.89 over LangMem on LoCoMo with GPT-4.1-mini

Overall BLEU-1

37.92

+2.82 over LangMem on LoCoMo with GPT-4.1-mini

Temporal F1

55.90

+13.33 over A-Mem and +13.33 over Mem0 on LoCoMo

PERSONAMEM Accuracy

63.25

+3.50 over A-Mem average accuracy

AdaMem is evaluated on LoCoMo and PERSONAMEM, which test long-horizon reasoning and evolving user modeling. The 44.65 F1 on LoCoMo and 63.25 accuracy on PERSONAMEM show that AdaMem’s structured, adaptive memory yields consistent gains over MemGPT, A-Mem, Mem0, LangMem, and Zep.

BENCHMARK

By the Numbers

BENCHMARK

Performance on the LoCoMo benchmark using GPT-4.1-mini

Overall F1 on LoCoMo for AdaMem and key memory baselines.

BENCHMARK

Ablation on key components

Overall F1 on LoCoMo for full AdaMem and ablated variants.

KEY INSIGHT

The Counterintuitive Finding

AdaMem’s largest gain appears in temporal questions, where F1 jumps to 55.90, up to +23.4 compared to prior methods.

This is surprising because temporal reasoning is often assumed to be mainly about longer context windows, not about structured graph memory and route planning.

WHY IT MATTERS

What this unlocks for the field

AdaMem unlocks long-horizon, user-centric dialogue where agents can track evolving preferences and cross-session events with structured, adaptive memory.

Builders can now deploy agents that maintain coherent personas and temporal narratives across 35-session, 9,000-token histories without collapsing into fragmented or redundant memories.

~14 min read← Back to papers

Related papers

BenchmarkAgent Memory

Active Context Compression: Autonomous Memory Management in LLM Agents

Nikhil Verma

· 2026

Focus Agent adds start_focus, complete_focus, a persistent Knowledge block, and an optimized Persistent Bash plus String-Replace Editor scaffold to actively compress context during long software-engineering tasks. On five hard SWE-bench Lite instances against a Baseline ReAct agent, Focus Agent achieves 22.7% token reduction (14.9M → 11.5M) while matching 3/5 = 60% task success.

arXiv:2601.07190 Read explainer

RAGBenchmarkAgent MemoryMemory Architecture

ADAM: A Systematic Data Extraction Attack on Agent Memory via Adaptive Querying

Xingyu Lyu, Jianfeng He et al.

· 2026

ADAM combines Anchor extraction, Distribution estimation, Anchor selection, and Query generation to adaptively probe agent memory via an auxiliary generator and entropy based selection. On the EHRAgent benchmark with Llama2-7b-chat, ADAM reaches EQ=77 and ASR=1.00, compared to MEXTRA’s EQ=44 and ASR=0.89.

arXiv:2604.09747 Read explainer

BenchmarkBenchmarkLong-Term Memory

AgenticAI-DialogGen: Topic-Guided Conversation Generation for Fine-Tuning and Evaluating Short- and Long-Term Memories of LLMs

Manoj Madushanka Perera, Adnan Mahmood et al.

· 2026

AgenticAI-DialogGen chains ChatPreprocessor, KnowledgeExtractor, TopicAnalyzer, KnowledgeGraphBuilder, PersonaGenerator, DuelingChat Agent, ConversationValidator, ConversationRefiner, QAGeneration, and PostProcessing to turn raw multi-session chats into topic-guided, persona-grounded conversations with explicit short- and long-term memories. On the TGC / KG memory QA benchmark, Mistral-7B fine-tuned within AgenticAI-DialogGen achieves 87.36 F1, compared to GPT-4’s 83.77 F1 in a zero-shot setting on the same task.

arXiv:2604.12179 Read explainer

Questions about this paper?

Answers use this explainer on Memory Papers.

Checking…