Generalization of Reinforcement Learners with Working and Episodic Memory

AuthorsMeire Fortunato, Melissa Tan, Ryan Faulkner et al.

arXiv 20192019

TL;DR

Memory Recall Agent (MRA) combines working and episodic memory with jumpy backpropagation and contrastive predictive coding to achieve the best average human‑normalized scores across the 13‑task Memory Tasks Suite.

SharePost on X LinkedIn

Read our summary here, or open the publisher PDF on the next tab.

THE PROBLEM

RL agents overfit training worlds and lack memory‑specific generalization

Reinforcement learning agents are commonly evaluated only on training environments, which is suboptimal for testing generalization and memory‑specific capabilities.

When task scale or visual stimuli change between train and holdout levels, agents without explicit working and episodic memory, such as standard IMPALA, can fail to reuse what they learned.

HOW IT WORKS

Memory Recall Agent — working plus episodic memory with jumpy credit assignment

Memory Recall Agent (MRA) combines a pixel‑input convolutional residual network, an LSTM working memory, a slot‑based episodic memory (MEM), contrastive predictive coding (CPC), and jumpy backpropagation.

You can think of the LSTM as RAM for short‑term computations, the episodic memory as a long‑term disk or hippocampus, and CPC as a self‑supervised organizer of stored experiences.

By learning keys for episodic slots and using jumpy backpropagation through MEM reads, Memory Recall Agent (MRA) can exploit long‑range experiences that a plain context‑limited recurrent policy cannot reach with truncated backpropagation.

DIAGRAM

Memory Recall Agent query‑time retrieval flow

This diagram shows how Memory Recall Agent (MRA) uses pixel embeddings and LSTM state to query episodic memory and feed retrieved summaries back into control at each time step.

DIAGRAM

Evaluation and ablation pipeline for Memory Recall Agent

This diagram shows how Memory Recall Agent (MRA) and its ablations are trained on small/large scales and evaluated on holdout‑interpolate and holdout‑extrapolate levels across the Memory Tasks Suite.

PROCESS

How Memory Recall Agent handles a Memory Tasks Suite episode

01
Pixel Input
Memory Recall Agent (MRA) receives rendered observations and passes them through the convolutional residual network to produce embeddings xt used throughout the architecture.
02
Working Memory
The LSTM working memory ingests xt and the episodic read vector mt, updating its hidden state ht that drives policy and value predictions.
03
Episodic Memory
Memory Recall Agent (MRA) forms keys from xt and ht, writes (pi, vi, ki) into the slot‑based episodic memory, and reads nearest neighbors using dot‑product attention.
04
Contrastive Predictive Coding
Using ht and future embeddings xt+τ, Memory Recall Agent (MRA) applies the CPC auxiliary loss with jumpy backpropagation to learn predictive representations that improve long‑range generalization.

KEY CONTRIBUTIONS

Key Contributions

01
Memory Tasks Suite for working and episodic memory
Memory Recall Agent (MRA) is evaluated on a suite of 13 tasks with scale and stimulus splits, including Arbitrary Visuomotor Mapping and Spot the Difference variants.
02
Memory Recall Agent architecture
Memory Recall Agent (MRA) integrates an LSTM working memory, a slot‑based episodic memory, CPC, and jumpy backpropagation on top of IMPALA for long‑range credit assignment.
03
Ablations of memory components and losses
Memory Recall Agent (MRA) is compared against 10 ablations, showing when episodic memory, CPC, and jumpy backpropagation help training and holdout generalization differently across tasks.

RESULTS

By the Numbers

Human normalized score

superhuman on Visible Goal Procedural Maze

LSTM + MEM exceeds human baseline on this task

Human normalized score

superhuman on Transitive Inference

LSTM + MEM surpasses human baseline in train and holdout

Task count

13 tasks

covers PsychLab, Spot the Difference, Goal Navigation, Transitive Inference

Ablation variants

10 models

vary working memory, MEM, CPC or reconstruction, and jumpy backpropagation

The benchmark is the Memory Tasks Suite with train, holdout‑interpolate, and holdout‑extrapolate levels, testing memory‑specific generalization. The main result shows Memory Recall Agent (MRA) achieves the best average human‑normalized performance across tasks compared to LSTM‑only IMPALA and other ablations.

BENCHMARK

Average normalized scores across tasks for key ablations

Relative ranking of Memory Recall Agent (MRA) versus LSTM‑only IMPALA and other ablations, as summarized in the heatmap of human‑normalized scores.

KEY INSIGHT

The Counterintuitive Finding

Memory Recall Agent (MRA) plus episodic memory and CPC shows a synergistic boost, where the combined gain exceeds the sum of individual gains on several tasks.

This is surprising because one might expect episodic memory and auxiliary loss to provide mostly redundant benefits, rather than compounding improvements across training and holdout.

WHY IT MATTERS

What this unlocks for the field

Memory Recall Agent (MRA) demonstrates that combining working memory, episodic memory, and predictive auxiliary losses can systematically improve memory‑specific generalization in reinforcement learning.

Builders can now design agents that retain and reuse experiences across long delays and changing stimuli, moving closer to human‑like memory abilities in complex environments.

~12 min read← Back to papers

Related papers

BenchmarkAgent Memory

Active Context Compression: Autonomous Memory Management in LLM Agents

Nikhil Verma

· 2026

Focus Agent adds start_focus, complete_focus, a persistent Knowledge block, and an optimized Persistent Bash plus String-Replace Editor scaffold to actively compress context during long software-engineering tasks. On five hard SWE-bench Lite instances against a Baseline ReAct agent, Focus Agent achieves 22.7% token reduction (14.9M → 11.5M) while matching 3/5 = 60% task success.

arXiv:2601.07190 Read explainer

RAGBenchmarkAgent MemoryMemory Architecture

ADAM: A Systematic Data Extraction Attack on Agent Memory via Adaptive Querying

Xingyu Lyu, Jianfeng He et al.

· 2026

ADAM combines Anchor extraction, Distribution estimation, Anchor selection, and Query generation to adaptively probe agent memory via an auxiliary generator and entropy based selection. On the EHRAgent benchmark with Llama2-7b-chat, ADAM reaches EQ=77 and ASR=1.00, compared to MEXTRA’s EQ=44 and ASR=0.89.

arXiv:2604.09747 Read explainer

BenchmarkMemory Architecture

AdaMem: Adaptive User-Centric Memory for Long-Horizon Dialogue Agents

Shannan Yan, Jingchen Ni et al.

· 2026

AdaMem organizes dialogue history into Working Memory, Episodic Memory, Persona Memory, and Graph Memory coordinated by a Memory Agent, Research Agent, and Working Agent. On LoCoMo with GPT-4.1-mini, AdaMem achieves 44.65 F1 overall, beating the best baseline LangMem at 41.76 F1 by +2.89.

arXiv:2603.16496 Read explainer

Questions about this paper?

Answers use this explainer on Memory Papers.

Checking…