Benchmark Agent Memory Long-Term Memory Memory Architecture

Lightweight LLM Agent Memory with Small Language Models

AuthorsJiaquan Zhang, Chaoning Zhang, Shuxu Chen et al.

2026

TL;DR

LightMem uses SLM based two stage retrieval and STM MTM LTM stores to gain about +2.5 F1 on LoCoMo with 83 ms retrieval latency.

SharePost on X LinkedIn

Read our summary here, or open the publisher PDF on the next tab.

THE PROBLEM

LLM agents trade accuracy for latency in long term memory

Retrieval based external memory has low online overhead but suffers from unstable accuracy due to limited query construction and candidate filtering.

LLM driven memory operations improve answer correctness but require repeated large model calls, accumulating latency over long interactions and harming user experience.

HOW IT WORKS

LightMem — SLM driven online memory with STM MTM LTM

LightMem uses SLM-1 Controller, SLM-2 Selector, SLM-3 Writer, and structured STM MTM LTM stores to decouple online querying from offline consolidation.

You can think of LightMem like RAM and disk: STM is working RAM, MTM is a user specific cache, and LTM is a compact shared knowledge disk.

This design lets LightMem run fixed budget, semantically verified retrieval and incremental consolidation, something a plain context window replay cannot provide efficiently over long horizons.

DIAGRAM

Online query time retrieval and writing flow in LightMem

This diagram shows how LightMem processes a single user turn using SLM-1, SLM-2, and SLM-3 with STM MTM LTM.

DIAGRAM

LightMem evaluation and ablation pipeline

This diagram shows how LightMem is evaluated on LoCoMo and DialSim with baselines and ablations.

PROCESS

How LightMem Handles a Multi turn Dialogue Session

01
Intent Modeling and Retrieval Control
LightMem uses SLM-1 Controller to infer intent attributes, generate hypothetical queries, and output metadata constraints and Top K budgets.
02
Two Stage Retrieval
LightMem runs metadata constrained coarse vector retrieval then SLM-2 Selector performs semantic consistency reranking and compresses 2K candidates to K memories.
03
Memory Writing and Update
After response generation, SLM-3 Writer summarizes the interaction into compact MTM entries, merging repetitive items and handling conflicts with temporal cues.
04
Offline Consolidation
A large context LLM periodically abstracts high value MTM episodes into de identified knowledge, incrementally updating the graph structured LTM with forgetting.

KEY CONTRIBUTIONS

Key Contributions

01
LightMem SLM driven memory system
LightMem introduces specialized SLM-1 Controller, SLM-2 Selector, and SLM-3 Writer to handle online query construction, retrieval, and writing under fixed Top K budgets.
02
Two stage memory querying design
LightMem first performs fast vector based coarse retrieval to 2K candidates, then uses semantic consistency verification in SLM-2 Selector to keep K truly relevant memories.
03
STM MTM LTM organization with isolation
LightMem organizes STM, MTM, and graph structured LTM with user identifiers, achieving about +2.5 average F1 on LoCoMo and 83 ms median retrieval latency.

RESULTS

By the Numbers

F1 multi hop GPT4o

34.50

+1.64 over A-MEM

F1 adversarial GPT4o mini

54.50

+4.47 over A-MEM

SBERT similarity DialSim

23.40

+3.89 over A-MEM

Retrieval Latency P50

83 ms

vs 856 ms for A-MEM

On LoCoMo, which tests single hop, multi hop, temporal, open domain, and adversarial reasoning, LightMem improves F1 while shortening effective context. On DialSim, LightMem increases SBERT similarity from 19.51 to 23.40, showing stronger semantic consistency in long term dialogue.

BENCHMARK

By the Numbers

BENCHMARK

Main results on LoCoMo multi hop with GPT 4o

F1 on LoCoMo multi hop questions using GPT 4o as response generator.

BENCHMARK

Latency comparison on GPT 4o mini

Retrieval Latency P50 in milliseconds for memory systems with GPT 4o mini.

KEY INSIGHT

The Counterintuitive Finding

LightMem keeps median retrieval latency at 83 ms while still improving LoCoMo F1 by about 2.5 on average across model scales.

This is surprising because adding SLM based controllers and reranking seems like extra overhead, yet LightMem is faster than A-MEM with 856 ms retrieval latency.

WHY IT MATTERS

What this unlocks for the field

LightMem enables LLM agents to maintain long horizon, user specific memory with fixed retrieval budgets and stable accuracy across diverse backbones.

Builders can now deploy multi session agents that personalize over time without replaying 16K token histories or paying repeated large model costs for every memory operation.

~12 min read← Back to papers

Related papers

BenchmarkAgent Memory

Active Context Compression: Autonomous Memory Management in LLM Agents

Nikhil Verma

· 2026

Focus Agent adds start_focus, complete_focus, a persistent Knowledge block, and an optimized Persistent Bash plus String-Replace Editor scaffold to actively compress context during long software-engineering tasks. On five hard SWE-bench Lite instances against a Baseline ReAct agent, Focus Agent achieves 22.7% token reduction (14.9M → 11.5M) while matching 3/5 = 60% task success.

arXiv:2601.07190 Read explainer

RAGBenchmarkAgent MemoryMemory Architecture

ADAM: A Systematic Data Extraction Attack on Agent Memory via Adaptive Querying

Xingyu Lyu, Jianfeng He et al.

· 2026

ADAM combines Anchor extraction, Distribution estimation, Anchor selection, and Query generation to adaptively probe agent memory via an auxiliary generator and entropy based selection. On the EHRAgent benchmark with Llama2-7b-chat, ADAM reaches EQ=77 and ASR=1.00, compared to MEXTRA’s EQ=44 and ASR=0.89.

arXiv:2604.09747 Read explainer

BenchmarkMemory Architecture

AdaMem: Adaptive User-Centric Memory for Long-Horizon Dialogue Agents

Shannan Yan, Jingchen Ni et al.

· 2026

AdaMem organizes dialogue history into Working Memory, Episodic Memory, Persona Memory, and Graph Memory coordinated by a Memory Agent, Research Agent, and Working Agent. On LoCoMo with GPT-4.1-mini, AdaMem achieves 44.65 F1 overall, beating the best baseline LangMem at 41.76 F1 by +2.89.

arXiv:2603.16496 Read explainer

Questions about this paper?

Answers use this explainer on Memory Papers.

Checking…