MemoNav: Working Memory Model for Visual Navigation

AuthorsHongxin Li, Zeyu Wang, Xu Yang et al.

2024

TL;DR

MemoNav uses a working-memory-style combination of STM, LTM, and a selective forgetting module to reach 74.7% SR on Gibson 1-goal, +4.7 points over VGM.

SharePost on X LinkedIn

Read our summary here, or open the publisher PDF on the next tab.

THE PROBLEM

Topological map agents waste steps on redundant nodes

Existing ImageNav methods use all historical observations for decision-making without considering the goal-relevant fraction, leading to inefficient exploration.

In image-goal navigation on Gibson and Matterport3D, this means agents traverse many redundant nodes, degrading navigation efficiency and lowering success rate on long multi-goal episodes.

HOW IT WORKS

MemoNav — STM, LTM, WM with selective forgetting

MemoNav’s core mechanism combines Short-term memory, a Selective forgetting module, Long-term memory, Working memory generation, and Transformer decoders over a topological map.

You can think of MemoNav like a brain with a notepad (STM), a long-term summary (LTM), and an attention filter that erases unhelpful notes while keeping a compact, task-focused scratchpad (WM).

This working-memory-style design lets MemoNav focus on goal-relevant nodes and scene-level structure instead of a flat context window over all past observations.

DIAGRAM

Working-memory navigation pipeline over time

This diagram shows how MemoNav processes images and updates STM, LTM, and WM at each time step before generating an action.

DIAGRAM

MemoNav evaluation and ablation design

This diagram shows how MemoNav is trained on Gibson 1-goal, then evaluated with ablations on Gibson and Matterport3D multi-goal tasks.

PROCESS

How MemoNav Handles an ImageNav Episode

01
STM generation
MemoNav uses the memory update module to encode the current panoramic RGBD image and store landmark node features as Short-term memory on the topological map.
02
Selective forgetting
MemoNav’s selective forgetting module ranks STM nodes by attention scores from Dgoal and temporarily removes nodes below threshold p from subsequent decision-making.
03
LTM generation
MemoNav maintains a trainable global node as Long-term memory that connects to all STM nodes and progressively aggregates their features at each time step.
04
Working memory generation and action
MemoNav applies GATv2-based working memory generation over retained STM and LTM, then two Transformer decoders and a policy network convert WM into navigation actions.

KEY CONTRIBUTIONS

Key Contributions

01
MemoNav working memory model
MemoNav introduces three scene representations—Short-term memory, Long-term memory, and Working memory generation—to improve image-goal navigation across Gibson and Matterport3D multi-goal tasks.
02
Selective forgetting module
MemoNav’s selective forgetting module uses attention scores from Dgoal to retain only informative STM, reducing redundancy and enabling higher SR and PR with fewer active nodes.
03
Global LTM node with GATv2 WM
MemoNav adds a global LTM node and a GATv2-based working memory generator, yielding up to +8.5 PR points over VGM on Gibson 3-goal tasks and +7.4 on 4-goal tasks.

RESULTS

By the Numbers

SR (Gibson 1 goal)

74.7%

+4.7 over VGM

SPL (Gibson 1 goal)

57.9%

+2.5 over VGM

PR (Gibson 4 goal)

28.9%

+7.4 over VGM

PR (Matterport3D 3 goal)

13.6%

+1.8 over VGM

On Gibson and Matterport3D ImageNav benchmarks, MemoNav is evaluated on 1-goal and multi-goal tasks, showing consistent SR and PR gains over VGM and TSGM. These numbers demonstrate that MemoNav’s working-memory-style STM, LTM, and selective forgetting pipeline yields more successful and efficient long-horizon navigation.

BENCHMARK

By the Numbers

BENCHMARK

Comparison between MemoNav and previous methods on Gibson 1-goal SR

Success Rate (SR) on Gibson 1-goal hard episodes.

BENCHMARK

Network component ablation results on Gibson 2-goal PR

Progress (PR) on Gibson 2-goal tasks for VGM and MemoNav component variants.

KEY INSIGHT

The Counterintuitive Finding

MemoNav maintains high PR on 3-goal tasks even when retaining only 20% of STM nodes, yet still reaches strong success rates.

This is surprising because one might expect aggressive forgetting to cripple navigation, but MemoNav’s attention-based selection shows most nodes are unnecessary for efficient multi-goal planning.

WHY IT MATTERS

What this unlocks for the field

MemoNav shows that navigation agents can use working-memory-style STM, LTM, and WM to plan efficient paths with compact, goal-focused scene memory.

Builders can now design embodied agents that scale to long multi-goal missions in complex 3D environments without exploding memory or exhaustive re-exploration.

~12 min read← Back to papers

Related papers

BenchmarkAgent Memory

Active Context Compression: Autonomous Memory Management in LLM Agents

Nikhil Verma

· 2026

Focus Agent adds start_focus, complete_focus, a persistent Knowledge block, and an optimized Persistent Bash plus String-Replace Editor scaffold to actively compress context during long software-engineering tasks. On five hard SWE-bench Lite instances against a Baseline ReAct agent, Focus Agent achieves 22.7% token reduction (14.9M → 11.5M) while matching 3/5 = 60% task success.

arXiv:2601.07190 Read explainer

RAGBenchmarkAgent MemoryMemory Architecture

ADAM: A Systematic Data Extraction Attack on Agent Memory via Adaptive Querying

Xingyu Lyu, Jianfeng He et al.

· 2026

ADAM combines Anchor extraction, Distribution estimation, Anchor selection, and Query generation to adaptively probe agent memory via an auxiliary generator and entropy based selection. On the EHRAgent benchmark with Llama2-7b-chat, ADAM reaches EQ=77 and ASR=1.00, compared to MEXTRA’s EQ=44 and ASR=0.89.

arXiv:2604.09747 Read explainer

BenchmarkMemory Architecture

AdaMem: Adaptive User-Centric Memory for Long-Horizon Dialogue Agents

Shannan Yan, Jingchen Ni et al.

· 2026

AdaMem organizes dialogue history into Working Memory, Episodic Memory, Persona Memory, and Graph Memory coordinated by a Memory Agent, Research Agent, and Working Agent. On LoCoMo with GPT-4.1-mini, AdaMem achieves 44.65 F1 overall, beating the best baseline LangMem at 41.76 F1 by +2.89.

arXiv:2603.16496 Read explainer

Questions about this paper?

Answers use this explainer on Memory Papers.

Checking…