MemoryBank: Enhancing Large Language Models with Long-Term Memory

AuthorsWanjun Zhong, Lianghong Guo, Qiqi Gao et al.

2023

TL;DR

MemoryBank adds Ebbinghaus-style memory updating on top of long-term storage and retrieval, enabling SiliconFriend to reach 0.716 correctness and 0.912 coherence on English probing questions.

SharePost on XLinkedIn

Read our summary here, or open the publisher PDF on the next tab.

THE PROBLEM

LLM companions lack long-term memory for sustained interactions

MemoryBank targets the limitation that LLMs lack long-term memory, which is critical for personal companions, psychological counseling, and secretarial assistance.

Without persistent memory, systems like SiliconFriend cannot recall past conversations or understand user personality, degrading rapport, emotional support, and personalized task management over time.

HOW IT WORKS

MemoryBank: Memory Storage, Retrieval, and Ebbinghaus-style Updating

MemoryBank’s core mechanism links Memory Storage, Memory Retrieval, and a Memory Updating Mechanism to store conversations, event summaries, and evolving user portraits.

You can think of MemoryBank like a human brain with a diary: detailed logs as notes, summaries as memories, and Ebbinghaus-style rehearsal deciding what to forget or reinforce.

This design lets MemoryBank keep rich user history and personality outside the context window, enabling SiliconFriend to recall and adapt in ways a plain prompt buffer cannot.

DIAGRAM

SiliconFriend query-time memory retrieval and augmentation

This diagram shows how MemoryBank retrieves relevant memories and user portraits for a new SiliconFriend query and injects them into the meta prompt.

DIAGRAM

Evaluation pipeline for MemoryBank with simulated long-term dialogs

This diagram shows how MemoryBank is evaluated using 10-day multi-user simulated dialogs and 194 probing questions scored by human annotators.

PROCESS

How MemoryBank Handles a Long-Term AI Companion Session

  1. 01

    Memory Storage

    MemoryBank logs daily multi-turn conversations with timestamps and builds hierarchical Event Summary and User Portrait layers for each user.

  2. 02

    Memory Retrieval

    MemoryBank encodes the current context and searches pre-encoded memories with a dual-tower dense retriever and FAISS to find relevant pieces.

  3. 03

    Memory Updating Mechanism

    MemoryBank updates memory strength S using the Ebbinghaus Forgetting Curve, increasing S and resetting time t when a memory is recalled.

  4. 04

    SiliconFriend Integration

    MemoryBank feeds relevant memory, global event summary, and global user portrait into SiliconFriend’s meta prompt to generate personalized responses.

KEY CONTRIBUTIONS

Key Contributions

  • 01

    MemoryBank long-term memory mechanism

    MemoryBank introduces Memory Storage, Memory Retrieval, and a Memory Updating Mechanism so LLMs can store, recall, and update long-term memories and user portraits.

  • 02

    SiliconFriend AI companion

    MemoryBank powers SiliconFriend, an AI companion tuned with 38k psychological conversations, enabling empathetic responses and personality-aware suggestions in English and Chinese.

  • 03

    Generalizable across LLMs and languages

    MemoryBank works with ChatGPT, ChatGLM, and BELLE, supports bilingual dialogs, and functions with or without the Ebbinghaus-based forgetting mechanism.

RESULTS

By the Numbers

Retrieval Acc.

0.763

-0.051 vs SiliconFriend BELLE on English

Correctness

0.716

+0.237 over SiliconFriend ChatGLM on English

Coherence

0.912

+0.232 over SiliconFriend ChatGLM on English

Ranking

0.818

+0.301 over SiliconFriend ChatGLM on English

On a 10-day, 15-user simulated dialog benchmark with 194 probing questions, MemoryBank-powered SiliconFriend ChatGPT is evaluated for retrieval accuracy, response correctness, coherence, and human ranking. These results show that MemoryBank enables SiliconFriend ChatGPT to give more correct and coherent long-term responses than SiliconFriend ChatGLM and SiliconFriend BELLE despite slightly lower retrieval accuracy.

BENCHMARK

By the Numbers

On a 10-day, 15-user simulated dialog benchmark with 194 probing questions, MemoryBank-powered SiliconFriend ChatGPT is evaluated for retrieval accuracy, response correctness, coherence, and human ranking. These results show that MemoryBank enables SiliconFriend ChatGPT to give more correct and coherent long-term responses than SiliconFriend ChatGLM and SiliconFriend BELLE despite slightly lower retrieval accuracy.

BENCHMARK

Quantitative analysis of SiliconFriend variants on English probing questions

Correctness score on English probing questions for three MemoryBank-powered SiliconFriend variants.

KEY INSIGHT

The Counterintuitive Finding

MemoryBank with SiliconFriend ChatGPT reaches 0.716 correctness and 0.912 coherence even though its retrieval accuracy (0.763) is lower than BELLE and ChatGLM.

This is surprising because we usually expect higher retrieval accuracy to directly yield better answers, but MemoryBank shows that how memories are used can outweigh raw retrieval hits.

WHY IT MATTERS

What this unlocks for the field

MemoryBank unlocks persistent, personality-aware AI companions that can recall specific books, algorithms, and emotional history across at least 10 days of interaction.

Builders can now bolt MemoryBank onto diverse LLMs to get human-like forgetting and reinforcement, enabling long-term counseling, coaching, and secretarial agents without retraining base models.

~12 min read← Back to papers

Related papers

BenchmarkAgent Memory

Active Context Compression: Autonomous Memory Management in LLM Agents

Nikhil Verma

· 2026

Focus Agent adds start_focus, complete_focus, a persistent Knowledge block, and an optimized Persistent Bash plus String-Replace Editor scaffold to actively compress context during long software-engineering tasks. On five hard SWE-bench Lite instances against a Baseline ReAct agent, Focus Agent achieves 22.7% token reduction (14.9M → 11.5M) while matching 3/5 = 60% task success.

Questions about this paper?

Paper: MemoryBank: Enhancing Large Language Models with Long-Term Memory

Answers use this explainer on Memory Papers.

Checking…