Latent learning: episodic memory complements parametric learning by enabling flexible reuse of experiences

AuthorsAndrew Kyle Lampinen, Martin Engelcke, Yuxuan Li et al.

2025

TL;DR

Latent learning with episodic retrieval shows that reinstating full past episodes into context lets transformers solve latent tests that pure parametric learning fails at, including near-zero vs high-accuracy gaps on reversals and codebooks.

SharePost on X LinkedIn

Read our summary here, or open the publisher PDF on the next tab.

THE PROBLEM

Latent learning failures in reversals and navigation

Latent learning highlights that parametric systems often fail to learn information that is not directly relevant to the current task but is crucial later.

Language models trained on “Plato taught Aristotle” cannot answer “Who taught Aristotle?” without context, and agents exploring mazes fail to exploit previously seen but never-goal objects.

HOW IT WORKS

Latent learning with episodic retrieval

Latent learning combines parametric learning, episodic memory, oracle retrieval, and within-experience in-context learning across codebooks, simple reversals, semantic structure, and latent gridworld benchmarks.

You can think of parametric learning as cortical weights, episodic memory as hippocampal traces, and oracle retrieval as a perfect card catalog that fetches the right episode.

This combination lets Latent learning turn hard out-of-context latent tests into easier in-context problems that transformers already solve, something a fixed context window without retrieval cannot achieve.

DIAGRAM

Inference flow for latent tests with and without retrieval

This diagram shows how Latent learning processes a query under parametric-only learning versus with oracle episodic retrieval on codebooks and simple reversals.

DIAGRAM

Evaluation pipeline across latent learning benchmarks

This diagram shows how Latent learning evaluates parametric versus retrieval-augmented systems on codebooks, reversals, semantic structure, and gridworld benchmarks.

PROCESS

How Latent learning Handles a latent test episode

01
Oracle retrieval mechanism
Latent learning first uses the oracle retrieval mechanism to fetch at least one relevant episode from episodic memory plus distractor episodes.
02
Within-experience in-context learning
Latent learning then relies on within-experience in-context learning to interpret the retrieved episode and learn the needed procedure, like reversal or encoding.
03
Task-cue modulated mapping
Latent learning applies the task-cue modulated mapping f(x,t) over the concatenated current input and retrieved episode to compute the desired output.
04
Latent generalization evaluation
Latent learning finally evaluates performance on latent tests such as reversals without context, latent codebook indices, or navigation to never-goal objects.

KEY CONTRIBUTIONS

Key Contributions

01
Identifying latent learning gap
Latent learning formally identifies latent learning of future-useful information as a key gap between natural and artificial intelligence using codebooks, reversals, semantic structure, and gridworld benchmarks.
02
Episodic memory as solution
Latent learning argues that episodic memory and nonparametric retrieval can bridge this gap by reinstating full episodes into context for flexible in-context reasoning.
03
Role of in-context learning
Latent learning highlights that within-experience in-context learning is essential for learning to use retrieved information effectively across episodes in oracle retrieval settings.

RESULTS

By the Numbers

Latent encoding accuracy

near 0%

large gap versus high validation accuracy on trained indices in codebooks

Reversal without context

near 0%

versus high forward and in-context reversal accuracy in simple reversals

Latent gridworld success

well below 100%

oracle retrieval improves success but remains far from ceiling on latent objects

Semantic structure accuracy

drops when cues reduced

retrieval advantage becomes clearer when similarity-based cues are weakened

Latent learning evaluates on codebooks, simple reversals, semantic structure, and latent gridworld benchmarks, showing that oracle retrieval enables success on latent tests where parametric-only systems remain near zero despite strong performance on explicit and in-context variants.

BENCHMARK

By the Numbers

BENCHMARK

Latent versus explicit performance across tasks

Relative performance on explicit validation versus latent test conditions for Latent learning with and without oracle retrieval.

KEY INSIGHT

The Counterintuitive Finding

Latent learning shows that transformers can answer reversals perfectly when the forward fact is in context, yet fail almost completely without it.

This is counterintuitive because parametric learning has clearly stored the needed information, but cannot flexibly recombine it for latent tasks without episodic retrieval.

WHY IT MATTERS

What this unlocks for the field

Latent learning unlocks a clear conceptual and empirical role for episodic retrieval in enabling flexible reuse of past experiences for new tasks.

Builders can now target memory systems that approximate hippocampal replay, focusing on retrieval and in-context learning rather than only scaling parametric weights.

~14 min read← Back to papers

Related papers

BenchmarkAgent Memory

Active Context Compression: Autonomous Memory Management in LLM Agents

Nikhil Verma

· 2026

Focus Agent adds start_focus, complete_focus, a persistent Knowledge block, and an optimized Persistent Bash plus String-Replace Editor scaffold to actively compress context during long software-engineering tasks. On five hard SWE-bench Lite instances against a Baseline ReAct agent, Focus Agent achieves 22.7% token reduction (14.9M → 11.5M) while matching 3/5 = 60% task success.

arXiv:2601.07190 Read explainer

RAGBenchmarkAgent MemoryMemory Architecture

ADAM: A Systematic Data Extraction Attack on Agent Memory via Adaptive Querying

Xingyu Lyu, Jianfeng He et al.

· 2026

ADAM combines Anchor extraction, Distribution estimation, Anchor selection, and Query generation to adaptively probe agent memory via an auxiliary generator and entropy based selection. On the EHRAgent benchmark with Llama2-7b-chat, ADAM reaches EQ=77 and ASR=1.00, compared to MEXTRA’s EQ=44 and ASR=0.89.

arXiv:2604.09747 Read explainer

BenchmarkMemory Architecture

AdaMem: Adaptive User-Centric Memory for Long-Horizon Dialogue Agents

Shannan Yan, Jingchen Ni et al.

· 2026

AdaMem organizes dialogue history into Working Memory, Episodic Memory, Persona Memory, and Graph Memory coordinated by a Memory Agent, Research Agent, and Working Agent. On LoCoMo with GPT-4.1-mini, AdaMem achieves 44.65 F1 overall, beating the best baseline LangMem at 41.76 F1 by +2.89.

arXiv:2603.16496 Read explainer

Questions about this paper?

Answers use this explainer on Memory Papers.

Checking…