Recurrent Neural Networks with External Memory for Language Understanding

AuthorsBaolin Peng, Kaisheng Yao

arXiv 20152015

TL;DR

RNN-EM uses an external memory of gated slots to store past hidden states and reaches 95.25% F1 on ATIS, +0.40 over LSTM.

SharePost on XLinkedIn

Read our summary here, or open the publisher PDF on the next tab.

THE PROBLEM

RNNs forget long term dependencies due to gradient vanishing and exploding

Simple recurrent neural networks suffer from gradient vanishing and exploding, which limits memory capacity because error signals cannot back propagate far enough.

In language understanding, this means semantic taggers cannot reliably connect distant words to their labels, hurting slot filling accuracy on datasets like ATIS.

HOW IT WORKS

RNN-EM architecture with external memory

RNN-EM introduces an external memory Mt, a key vector kt, a forget gate ft, and an update gate ut to control read write operations.

You can think of RNN-EM as a CPU with RAM, where the recurrent hidden layer ht is the processor and Mt is an addressable memory bank.

This design lets RNN-EM selectively retrieve and update past hidden activities beyond what a fixed recurrent state can hold, overcoming plain context window limitations.

DIAGRAM

RNN-EM memory read and write over time

This diagram shows how RNN-EM reads from Mt−1 and writes to Mt at each time step using wt, ft, ut, ct, and vt.

DIAGRAM

ATIS evaluation and memory size ablation pipeline

This diagram shows how RNN-EM is trained and evaluated on ATIS while sweeping memory slot number n.

PROCESS

How RNN-EM Handles a Language Understanding Sentence

  1. 01

    Model input and output

    RNN-EM maps each word window to an embedding xt and computes hidden activity ht using Wih and Wc with content ct from the external memory.

  2. 02

    External memory read

    RNN-EM generates key kt and sharpening scalar βt from ht, builds weight vector wt via cosine similarity with Mt, and reads ct = Mt−1 wt−1.

  3. 03

    External memory update

    RNN-EM produces new content vt, computes forget gate ft and update gate ut from wt and erase vector et, and updates Mt with vt using Eq. (19).

  4. 04

    Output softmax prediction

    RNN-EM feeds ht into Who to produce softmax output yt, giving a semantic tag for the current word in the ATIS sentence.

KEY CONTRIBUTIONS

Key Contributions

  • 01

    RNN-EM architecture with external memory

    RNN-EM augments simple RNNs with an external memory Mt, key vector kt, and gates ft and ut to store and retrieve past hidden activities across sentences.

  • 02

    State of the art on ATIS

    RNN-EM reaches 95.25% F1 on ATIS, surpassing LSTM at 94.85% and GRNN at 94.82% using comparable parameter counts around 7.3×10^3.

  • 03

    Memory size analysis for RNN-EM

    RNN-EM systematically varies memory slots n from 1 to 512, showing best F1 95.22% at n = 8 and revealing non monotonic effects on training entropy.

RESULTS

By the Numbers

F1 score

95.25%

+0.40 over LSTM

F1 score

94.85%

LSTM baseline on ATIS

F1 score

94.82%

GRNN baseline with gates

F1 score

94.35%

CNN baseline without recurrent memory

On the ATIS spoken language understanding benchmark, which tests slot filling accuracy, RNN-EM's 95.25% F1 demonstrates that external memory improves semantic tagging beyond gated recurrent baselines.

BENCHMARK

By the Numbers

On the ATIS spoken language understanding benchmark, which tests slot filling accuracy, RNN-EM's 95.25% F1 demonstrates that external memory improves semantic tagging beyond gated recurrent baselines.

BENCHMARK

F1 scores on ATIS

F1 score on the ATIS language understanding task for RNN-EM and baseline models.

KEY INSIGHT

The Counterintuitive Finding

RNN-EM achieves its best F1 of 95.22% with only 8 memory slots, while larger memories up to 512 slots reduce F1 to as low as 94.53%.

This is surprising because increasing memory capacity is expected to help, but RNN-EM shows that too many slots can hurt training entropy and generalization.

WHY IT MATTERS

What this unlocks for the field

RNN-EM shows that a lightweight external memory with content based addressing can boost recurrent language understanding without complex Neural Turing Machine controllers.

Builders can now design compact slot filling systems that retain long range context across sentences using gated external memories rather than ever deeper or wider recurrent networks.

~10 min read← Back to papers

Related papers

Memory Architecture

A Control Architecture for Training-Free Memory Use

Yanzhen Lu, Muchen Jiang et al.

· 2026

TAG routes low-confidence steps to uncertainty-based routing, filters them with guarded acceptance with rollback, chooses between bank selection across rule and exemplar memory, and prunes via evidence-based retirement inside a unified control loop. On SVAMP and ASDiv, TAG reaches 81.0% and 85.2% accuracy, improving over the 74.0% and 77.5% no-memory baselines while a compute-matched Retry baseline stays flat.

Questions about this paper?

Paper: Recurrent Neural Networks with External Memory for Language Understanding

Answers use this explainer on Memory Papers.

Checking…