Emergent Symbols through Binding in External Memory

AuthorsTaylor W. Webb, Ishan Sinha, Jonathan D. Cohen

arXiv 20202020

TL;DR

Emergent Symbol Binding Network (ESBN) uses a two-column key value external memory with indirection to achieve nearly perfect (>95%) rule generalization to novel images across four abstract reasoning tasks.

SharePost on X LinkedIn

Read our summary here, or open the publisher PDF on the next tab.

THE PROBLEM

Neural nets fail to generalize abstract rules from tiny training subsets

Deep networks require enormous training data and generalize poorly outside the training distribution, even on simple same different tasks.

When only a few of 100 Unicode images are seen during training, architectures like LSTM and NTM fail to apply learned rules to withheld symbols, blocking human like abstraction.

HOW IT WORKS

Emergent Symbol Binding Network — indirection via key value external memory

ESBN introduces a two column external memory, a recurrent LSTM controller fs, a shared image encoder fe, and temporal context normalization to separate variable and entity processing streams.

Think of ESBN like a CPU with registers operating over symbolic pointers, while a separate perceptual module stores raw data in a RAM like key value table.

This indirection lets ESBN learn rules over abstract keys and then bind them to arbitrary image embeddings, enabling generalization beyond any fixed context window.

DIAGRAM

Sequence of binding and retrieval in ESBN during a task

This diagram shows how ESBN processes each image, writes key value bindings, and later retrieves keys by matching current embeddings to value memory.

DIAGRAM

Training and evaluation pipeline for ESBN on abstract rule tasks

This diagram shows how ESBN is trained on subsets of 100 Unicode images and evaluated on withheld images across four tasks.

PROCESS

How ESBN Handles an Abstract Rule Learning Problem

01
Image encoding with fe
ESBN first passes each 32 by 32 Unicode image through encoder fe to produce low dimensional embeddings z_t for all positions in the sequence.
02
Temporal context normalization
ESBN applies temporal context normalization to embeddings, either over the full sequence or separately per component, to emphasize relational information within each problem.
03
Key value binding in external memory
At each time step, ESBN writes the current embedding as a value to M_v and a controller generated key k_wt to M_k, forming bindings between abstract variables and concrete images.
04
Indirection driven reasoning and prediction
ESBN retrieves keys by matching current embeddings against M_v, feeds retrieved keys and confidences into the LSTM controller fs, and produces the final task prediction y_hat_T.

KEY CONTRIBUTIONS

Key Contributions

01
Emergent Symbol Binding Network architecture
ESBN introduces a two column key value external memory with an LSTM controller fs and encoder fe that only interact via bindings, enabling variable like keys to emerge.
02
Systematic generalization on abstract rule tasks
ESBN achieves nearly perfect generalization (≥95% accuracy) on same different, RMTS, distribution of three, and identity rules tasks, even when trained on only hundreds of problems.
03
Analysis of emergent symbols and ablations
ESBN shows overlapping key representations between training and test entities, robust performance with MLP or random encoders, and task specific dependence on confidence values.

RESULTS

By the Numbers

Test accuracy same different

≥95%

at m up to 98 withheld images vs baselines that fail in extreme regimes

Test accuracy RMTS

≥95%

vs LSTM, NTM, MNM, RN, Transformer, PrediNet which cannot match this in hardest regimes

Test accuracy distribution three

≥95%

while Relation Net needs 10× more data (10^5 vs 10^4 problems) to approach success

Training updates to converge

100–200 updates

vs thousands or tens of thousands for all alternative architectures

Across four abstract rule learning tasks built from 100 Unicode images, ESBN is trained on as few as 10^4 problems and sometimes only hundreds. These results show that ESBN learns abstract rules data efficiently and generalizes to withheld images where standard architectures fail.

BENCHMARK

By the Numbers

BENCHMARK

Generalization accuracy across architectures on abstract rule tasks

Test accuracy on hardest generalization regimes (withheld images) for same different and RMTS tasks.

KEY INSIGHT

The Counterintuitive Finding

ESBN still generalizes nearly perfectly when trained on problems built from only two images in the same different task and tested on 98 unseen images.

This is surprising because architectures like NTM and Transformer usually need dense sampling of object space, yet ESBN achieves strong abstraction from extremely sparse experience.

WHY IT MATTERS

What this unlocks for the field

ESBN shows that neural networks can acquire symbol like variables through architectural indirection alone, without explicit symbolic modules or huge datasets.

Builders can now design agents that learn abstract rules from few examples and transfer them to entirely new visual entities, bringing neural reasoning closer to human like flexibility.

~14 min read← Back to papers

Related papers

Memory Architecture

A Control Architecture for Training-Free Memory Use

Yanzhen Lu, Muchen Jiang et al.

· 2026

TAG routes low-confidence steps to uncertainty-based routing, filters them with guarded acceptance with rollback, chooses between bank selection across rule and exemplar memory, and prunes via evidence-based retirement inside a unified control loop. On SVAMP and ASDiv, TAG reaches 81.0% and 85.2% accuracy, improving over the 74.0% and 77.5% no-memory baselines while a compute-matched Retry baseline stays flat.

arXiv:2604.18206 Read explainer

RAGBenchmarkAgent MemoryMemory Architecture

ADAM: A Systematic Data Extraction Attack on Agent Memory via Adaptive Querying

Xingyu Lyu, Jianfeng He et al.

· 2026

ADAM combines Anchor extraction, Distribution estimation, Anchor selection, and Query generation to adaptively probe agent memory via an auxiliary generator and entropy based selection. On the EHRAgent benchmark with Llama2-7b-chat, ADAM reaches EQ=77 and ASR=1.00, compared to MEXTRA’s EQ=44 and ASR=0.89.

arXiv:2604.09747 Read explainer

BenchmarkMemory Architecture

AdaMem: Adaptive User-Centric Memory for Long-Horizon Dialogue Agents

Shannan Yan, Jingchen Ni et al.

· 2026

AdaMem organizes dialogue history into Working Memory, Episodic Memory, Persona Memory, and Graph Memory coordinated by a Memory Agent, Research Agent, and Working Agent. On LoCoMo with GPT-4.1-mini, AdaMem achieves 44.65 F1 overall, beating the best baseline LangMem at 41.76 F1 by +2.89.

arXiv:2603.16496 Read explainer

Questions about this paper?

Answers use this explainer on Memory Papers.

Checking…