MemoryBank: Enhancing Large Language Models with Long-Term Memory

AuthorsWanjun Zhong, Lianghong Guo, Qiqi Gao et al.

2023

TL;DR

MemoryBank adds Ebbinghaus-style memory updating on top of long-term storage and retrieval, enabling SiliconFriend to reach 0.716 correctness and 0.912 coherence on English probing questions.

SharePost on X LinkedIn

Read our summary here, or open the publisher PDF on the next tab.

THE PROBLEM

LLM companions lack long-term memory for sustained interactions

MemoryBank targets the limitation that LLMs lack long-term memory, which is critical for personal companions, psychological counseling, and secretarial assistance.

Without persistent memory, systems like SiliconFriend cannot recall past conversations or understand user personality, degrading rapport, emotional support, and personalized task management over time.

HOW IT WORKS

MemoryBank: Memory Storage, Retrieval, and Ebbinghaus-style Updating

MemoryBank’s core mechanism links Memory Storage, Memory Retrieval, and a Memory Updating Mechanism to store conversations, event summaries, and evolving user portraits.

You can think of MemoryBank like a human brain with a diary: detailed logs as notes, summaries as memories, and Ebbinghaus-style rehearsal deciding what to forget or reinforce.

This design lets MemoryBank keep rich user history and personality outside the context window, enabling SiliconFriend to recall and adapt in ways a plain prompt buffer cannot.

DIAGRAM

SiliconFriend query-time memory retrieval and augmentation

This diagram shows how MemoryBank retrieves relevant memories and user portraits for a new SiliconFriend query and injects them into the meta prompt.

DIAGRAM

Evaluation pipeline for MemoryBank with simulated long-term dialogs

This diagram shows how MemoryBank is evaluated using 10-day multi-user simulated dialogs and 194 probing questions scored by human annotators.

PROCESS

How MemoryBank Handles a Long-Term AI Companion Session

01
Memory Storage
MemoryBank logs daily multi-turn conversations with timestamps and builds hierarchical Event Summary and User Portrait layers for each user.
02
Memory Retrieval
MemoryBank encodes the current context and searches pre-encoded memories with a dual-tower dense retriever and FAISS to find relevant pieces.
03
Memory Updating Mechanism
MemoryBank updates memory strength S using the Ebbinghaus Forgetting Curve, increasing S and resetting time t when a memory is recalled.
04
SiliconFriend Integration
MemoryBank feeds relevant memory, global event summary, and global user portrait into SiliconFriend’s meta prompt to generate personalized responses.

KEY CONTRIBUTIONS

Key Contributions

01
MemoryBank long-term memory mechanism
MemoryBank introduces Memory Storage, Memory Retrieval, and a Memory Updating Mechanism so LLMs can store, recall, and update long-term memories and user portraits.
02
SiliconFriend AI companion
MemoryBank powers SiliconFriend, an AI companion tuned with 38k psychological conversations, enabling empathetic responses and personality-aware suggestions in English and Chinese.
03
Generalizable across LLMs and languages
MemoryBank works with ChatGPT, ChatGLM, and BELLE, supports bilingual dialogs, and functions with or without the Ebbinghaus-based forgetting mechanism.

RESULTS

By the Numbers

Retrieval Acc.

0.763

-0.051 vs SiliconFriend BELLE on English

Correctness

0.716

+0.237 over SiliconFriend ChatGLM on English

Coherence

0.912

+0.232 over SiliconFriend ChatGLM on English

Ranking

0.818

+0.301 over SiliconFriend ChatGLM on English

On a 10-day, 15-user simulated dialog benchmark with 194 probing questions, MemoryBank-powered SiliconFriend ChatGPT is evaluated for retrieval accuracy, response correctness, coherence, and human ranking. These results show that MemoryBank enables SiliconFriend ChatGPT to give more correct and coherent long-term responses than SiliconFriend ChatGLM and SiliconFriend BELLE despite slightly lower retrieval accuracy.

BENCHMARK

By the Numbers

BENCHMARK

Quantitative analysis of SiliconFriend variants on English probing questions

Correctness score on English probing questions for three MemoryBank-powered SiliconFriend variants.

KEY INSIGHT

The Counterintuitive Finding

MemoryBank with SiliconFriend ChatGPT reaches 0.716 correctness and 0.912 coherence even though its retrieval accuracy (0.763) is lower than BELLE and ChatGLM.

This is surprising because we usually expect higher retrieval accuracy to directly yield better answers, but MemoryBank shows that how memories are used can outweigh raw retrieval hits.

WHY IT MATTERS

What this unlocks for the field

MemoryBank unlocks persistent, personality-aware AI companions that can recall specific books, algorithms, and emotional history across at least 10 days of interaction.

Builders can now bolt MemoryBank onto diverse LLMs to get human-like forgetting and reinforcement, enabling long-term counseling, coaching, and secretarial agents without retraining base models.

~12 min read← Back to papers

Related papers

BenchmarkAgent Memory

Active Context Compression: Autonomous Memory Management in LLM Agents

Nikhil Verma

· 2026

Focus Agent adds start_focus, complete_focus, a persistent Knowledge block, and an optimized Persistent Bash plus String-Replace Editor scaffold to actively compress context during long software-engineering tasks. On five hard SWE-bench Lite instances against a Baseline ReAct agent, Focus Agent achieves 22.7% token reduction (14.9M → 11.5M) while matching 3/5 = 60% task success.

arXiv:2601.07190 Read explainer

RAGBenchmarkAgent MemoryMemory Architecture

ADAM: A Systematic Data Extraction Attack on Agent Memory via Adaptive Querying

Xingyu Lyu, Jianfeng He et al.

· 2026

ADAM combines Anchor extraction, Distribution estimation, Anchor selection, and Query generation to adaptively probe agent memory via an auxiliary generator and entropy based selection. On the EHRAgent benchmark with Llama2-7b-chat, ADAM reaches EQ=77 and ASR=1.00, compared to MEXTRA’s EQ=44 and ASR=0.89.

arXiv:2604.09747 Read explainer

BenchmarkMemory Architecture

AdaMem: Adaptive User-Centric Memory for Long-Horizon Dialogue Agents

Shannan Yan, Jingchen Ni et al.

· 2026

AdaMem organizes dialogue history into Working Memory, Episodic Memory, Persona Memory, and Graph Memory coordinated by a Memory Agent, Research Agent, and Working Agent. On LoCoMo with GPT-4.1-mini, AdaMem achieves 44.65 F1 overall, beating the best baseline LangMem at 41.76 F1 by +2.89.

arXiv:2603.16496 Read explainer

Questions about this paper?

Answers use this explainer on Memory Papers.

Checking…