Neural SLAM: Learning to Explore with External Memory

AuthorsJingwei Zhang, Lei Tai, Ming Liu et al.

arXiv 20172017

TL;DR

Neural SLAM embeds SLAM-like motion prediction and measurement update into an external memory, enabling 13.732 average reward and 46/50 success on 16×16 exploration tasks.

SharePost on X LinkedIn

Read our summary here, or open the publisher PDF on the next tab.

THE PROBLEM

Exploration agents fail to cover complex maps under time limits

Neural SLAM targets exploration where agents must clear all accessible cells before 750 steps, but random policies need 5531.600 steps on 16×16 worlds.

Without long term memory and cognitive mapping, reinforcement learning agents miss distant unexplored regions, leaving coverage tasks unsolved and wasting many actions.

HOW IT WORKS

Neural SLAM architecture with embedded SLAM structure

Neural SLAM uses an LSTM, Localization and Motion Prediction, Data Association, Measurement Update, and Mapping over a 2D external memory to maintain a global map-like belief.

You can think of Neural SLAM as a learned SLAM stack where the LSTM is the controller and the external memory is a writable grid RAM that stores the agent’s cognitive map.

This embedded SLAM structure lets Neural SLAM plan using a persistent internal map instead of a short context window, enabling coverage of large, partially observed environments.

DIAGRAM

Neural SLAM memory access and SLAM like update flow

This diagram shows how Neural SLAM updates its external memory each time step using motion prediction, data association, and measurement update before computing a policy.

DIAGRAM

Training and evaluation pipeline for Neural SLAM exploration

This diagram shows how Neural SLAM is trained with A3C on grid worlds and then evaluated on larger unseen environments.

PROCESS

How Neural SLAM Handles an Exploration Episode

01
Localization and Motion Prediction
Neural SLAM uses Localization and Motion Prediction to apply the previous action onto the access weights, updating the belief over positions on the global map.
02
Data Association
Neural SLAM computes a key from the LSTM and runs Data Association against the external memory using cosine similarity to obtain content based access weights.
03
Measurement Update
Neural SLAM interpolates motion and content weights, applies a shift kernel, and sharpens them during the Measurement Update to refine the current belief.
04
Mapping and Policy
Neural SLAM performs Mapping with erase and add vectors, reads a summary vector, and feeds it with the LSTM state into the policy and value heads.

KEY CONTRIBUTIONS

Key Contributions

01
Neural SLAM architecture
Neural SLAM embeds Localization and Motion Prediction, Data Association, Measurement Update, and Mapping into an external memory controller to evolve SLAM like behaviors end to end.
02
Long term exploration memory
Neural SLAM uses a 2D external memory of size 16×16×16 as a cognitive map, enabling coverage of grid worlds up to 16×16 under a 750 step limit.
03
Generalization to larger worlds
Neural SLAM trained on 8×8 to 12×12 worlds generalizes to 16×16, achieving 13.732 average reward and 46/50 success episodes compared to 7.196 and 37/50 for A3C Nav2.

RESULTS

By the Numbers

Average reward

13.732

+6.536 over A3C-Nav2

Average steps

174.920

−108.560 steps vs A3C-Nav2

Success ratio

46/50

+9 episodes vs A3C-Nav2

Random steps

5531.600

baseline difficulty for random exploration

On 50 randomly generated 16×16 grid worlds, Neural SLAM is evaluated for exploration coverage under a 750 step cap, demonstrating substantially higher reward and success than A3C-Nav2 and A3C baselines.

BENCHMARK

By the Numbers

BENCHMARK

Generalization performance on 16x16 grid worlds

Average reward on 50 randomly generated 16×16 environments.

KEY INSIGHT

The Counterintuitive Finding

Neural SLAM with an explicit motion model reaches 13.732 reward, while A3C-Ext with an external memory but no motion model drops to −8.127.

This is surprising because both agents have similar memory capacity, yet without embedded SLAM structure the external memory actually harms performance on larger worlds.

WHY IT MATTERS

What this unlocks for the field

Neural SLAM shows that embedding SLAM like motion prediction and measurement update into external memory yields robust long horizon exploration policies.

Builders can now design agents that learn their own cognitive maps for coverage and navigation tasks, without hand engineered SLAM pipelines or explicit occupancy grid maps.

~12 min read← Back to papers

Related papers

Agent MemoryLong-Term Memory

Adaptive Memory Admission Control for LLM Agents

Guilin Zhang, Wei Jiang et al.

· 2026

A-MAC scores candidate memories using Utility, Confidence, Novelty, Recency, and Type Prior combined by a learned linear admission policy with Algorithm 1 A-MAC Memory Admission. On the LoCoMo benchmark, A-MAC achieves F1 0.583 and 2644 ms latency, improving F1 by 0.042 and reducing latency by 1187 ms compared to A-mem.

arXiv:2603.04549 Read explainer

Long-Term Memory

Advancing Open-source World Models

Robbyant Team, Zelin Gao et al.

arXiv 2026 · 2026

LingBot-World combines a Data Engine, Fundamental World Model, Action-Conditioned World Model, and Post-Training causal adaptation to turn a 28B-parameter video generator into a real-time interactive world simulator. On the VBench benchmark, LingBot-World achieves a dynamic degree of 0.8857 versus 0.7612 for Yume-1.5, while also improving imaging quality to 0.6683.

arXiv:2601.20540 Read explainer

BenchmarkBenchmarkLong-Term Memory

AgenticAI-DialogGen: Topic-Guided Conversation Generation for Fine-Tuning and Evaluating Short- and Long-Term Memories of LLMs

Manoj Madushanka Perera, Adnan Mahmood et al.

· 2026

AgenticAI-DialogGen chains ChatPreprocessor, KnowledgeExtractor, TopicAnalyzer, KnowledgeGraphBuilder, PersonaGenerator, DuelingChat Agent, ConversationValidator, ConversationRefiner, QAGeneration, and PostProcessing to turn raw multi-session chats into topic-guided, persona-grounded conversations with explicit short- and long-term memories. On the TGC / KG memory QA benchmark, Mistral-7B fine-tuned within AgenticAI-DialogGen achieves 87.36 F1, compared to GPT-4’s 83.77 F1 in a zero-shot setting on the same task.

arXiv:2604.12179 Read explainer

Questions about this paper?

Answers use this explainer on Memory Papers.

Checking…