MEMO: A Deep Network for Flexible Combination of Episodic Memories

AuthorsAndrea Banino, Adrià Puigdomènech Badia, Raphael Köster et al.

2020

TL;DR

MEMO uses separated episodic facts plus an adaptive halting policy to reach 0.21% error on joint bAbI 10k, beating Memory Networks by 3.99 percentage points.

SharePost on X LinkedIn

Read our summary here, or open the publisher PDF on the next tab.

THE PROBLEM

Memory architectures struggle with distant associations in reasoning tasks

MEMO is motivated by the finding that current architectures struggle to reason over long distance associations and fail on complex inference tasks.

When Paired Associative Inference and shortest path tasks require chaining multiple facts, EMN, DNC, and Universal Transformer often mis-handle indirect queries, limiting robust reasoning.

HOW IT WORKS

MEMO — flexible episodic memory with adaptive hops

MEMO introduces a common embedding for each fact, separate keys and values per head, recurrent attention, and a halting policy that learns how many memory hops to take.

You can think of MEMO like a hippocampus-inspired system: facts are stored as separate episodes, and a recurrent retrieval loop selectively chains them, similar to pattern completion in biological memory.

This KEY_MECHANISM of separated facts plus adaptive multi hop retrieval lets MEMO discover multi step relationships that a fixed depth, single pass context window cannot capture.

DIAGRAM

MEMO inference flow across memory hops

This diagram shows how MEMO repeatedly queries external memory with recurrent attention and uses the halting policy to decide when to stop hopping.

DIAGRAM

Evaluation pipeline across PAI, shortest path, and bAbI

This diagram shows how MEMO is trained and evaluated on Paired Associative Inference, shortest path, and bAbI tasks.

PROCESS

How MEMO Handles a Paired Associative Inference Query

01
Common embedding of inputs
MEMO first applies the common embedding matrix Wc to each input xi, producing ci that preserves all items in each episodic fact.
02
Multi head key and value projection
MEMO flattens ci and uses Wk(h) and Wv(h) to create multi head keys k(h)_i and values v(h)_i, while Wq(h) embeds the query q.
03
Recurrent attention over memory
MEMO runs recurrent attention using Wh, Wq, DropOut, and LayerNorm so Qt iteratively focuses on linked facts across hops.
04
Halting policy and answer prediction
MEMO feeds Bhattacharyya distance d(Wt, Wt-1) and step t into the GRU based halting policy, then stops and outputs at through Wa and Wqa.

KEY CONTRIBUTIONS

Key Contributions

01
Paired Associative Inference task
MEMO introduces Paired Associative Inference, including A-B-C, A-B-C-D, and A-B-C-D-E chains, to stress distant relationships across multiple facts.
02
Flexible episodic memory representation
MEMO keeps facts separated in external memory and uses multi head recurrent attention over keys and values to support inferential reasoning.
03
REINFORCE based halting policy
MEMO adds a REINFORCE trained halting policy with an LHop term that directly minimizes the expected number of computation steps.

RESULTS

By the Numbers

bAbI joint error 10k

0.21%

-3.99 percentage points vs Memory Networks 4.2%

PAI A C accuracy

98.26%

+37.25 percentage points vs EMN 61.01%

PAI A D accuracy

97.22%

+48.56 percentage points vs EMN 48.66%

Graph 20 5 first node

69.20%

+45.21 percentage points vs DNC 23.99%

These metrics come from Paired Associative Inference, shortest path on random graphs, and joint bAbI 10k benchmarks. MAIN_RESULT shows that MEMO reliably handles long distance reasoning while using adaptive computation.

BENCHMARK

By the Numbers

BENCHMARK

Paired Associative Inference — hardest query accuracy

Accuracy on the hardest inference query for each PAI length (A-C, A-D, A-E).

KEY INSIGHT

The Counterintuitive Finding

On the A-B-C-D-E PAI task, MEMO reaches 84.54% accuracy on A-E, while EMN stays at 45.13% and DNC at 62.61%.

This is surprising because DNC and EMN already use external memory, yet MEMO’s separated facts plus recurrent attention nearly double EMN’s performance on the longest chain.

WHY IT MATTERS

What this unlocks for the field

MEMO shows that episodic memories stored as separate facts, combined with adaptive multi hop retrieval, can support robust long distance inferential reasoning.

Builders can now design memory systems that automatically allocate more computation to harder queries, chaining multiple experiences without hand tuned hop counts or quadratic self attention.

~12 min read← Back to papers

Related papers

Memory Architecture

A Control Architecture for Training-Free Memory Use

Yanzhen Lu, Muchen Jiang et al.

· 2026

TAG routes low-confidence steps to uncertainty-based routing, filters them with guarded acceptance with rollback, chooses between bank selection across rule and exemplar memory, and prunes via evidence-based retirement inside a unified control loop. On SVAMP and ASDiv, TAG reaches 81.0% and 85.2% accuracy, improving over the 74.0% and 77.5% no-memory baselines while a compute-matched Retry baseline stays flat.

arXiv:2604.18206 Read explainer

RAGBenchmarkAgent MemoryMemory Architecture

ADAM: A Systematic Data Extraction Attack on Agent Memory via Adaptive Querying

Xingyu Lyu, Jianfeng He et al.

· 2026

ADAM combines Anchor extraction, Distribution estimation, Anchor selection, and Query generation to adaptively probe agent memory via an auxiliary generator and entropy based selection. On the EHRAgent benchmark with Llama2-7b-chat, ADAM reaches EQ=77 and ASR=1.00, compared to MEXTRA’s EQ=44 and ASR=0.89.

arXiv:2604.09747 Read explainer

BenchmarkMemory Architecture

AdaMem: Adaptive User-Centric Memory for Long-Horizon Dialogue Agents

Shannan Yan, Jingchen Ni et al.

· 2026

AdaMem organizes dialogue history into Working Memory, Episodic Memory, Persona Memory, and Graph Memory coordinated by a Memory Agent, Research Agent, and Working Agent. On LoCoMo with GPT-4.1-mini, AdaMem achieves 44.65 F1 overall, beating the best baseline LangMem at 41.76 F1 by +2.89.

arXiv:2603.16496 Read explainer

Questions about this paper?

Answers use this explainer on Memory Papers.

Checking…