Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory

AuthorsZhenting Wang, Huancheng Chen, Jiayun Wang, Wei Wei

2026

TL;DR

Memex(RL) trains Memex’s Indexed Experience Memory to compress trajectories into indexed summaries and boosts task success from 24.22% to 85.61% while shrinking peak context.

SharePost on X LinkedIn

Read our summary here, or open the publisher PDF on the next tab.

THE PROBLEM

Long-horizon agents overflow context and lose evidence

Memex(RL) targets long-horizon agents where prompts grow to 16,934.46 tokens, far beyond the 8,000-token context penalty threshold, degrading decisions.

In modified ALFWorld, Memex(RL) shows that without structured memory, tool-heavy workflows either exceed context budgets or rely on lossy summaries that drop crucial logs, IDs, and tool outputs.

HOW IT WORKS

Memex and Indexed Experience Memory

Memex(RL) centers on Indexed Experience Memory, combining IndexedSummary, CompressExperience, ReadExperience, and an external experience store D with ContextStatus-driven control.

Think of Memex(RL) as RAM plus disk: the indexed summary is fast working memory, while the key–value store D is a long-term archive addressed by stable indices.

This design lets Memex(RL) keep a compact, pointer-heavy context yet deterministically dereference exact past artifacts, something a plain context window or fuzzy semantic retrieval cannot guarantee.

DIAGRAM

Memex agent loop with indexed compression and retrieval

This diagram shows how Memex(RL) runs the Memex agent loop from context monitoring through compression, retrieval, tool calls, and final Finish.

DIAGRAM

MemexRL training and evaluation pipeline

This diagram shows how Memex(RL) samples rollouts, computes memory-aware rewards, updates the policy, and evaluates Memex on modified ALFWorld.

PROCESS

How Memex(RL) Handles a Long-Horizon Tool-Use Episode

01
Memex Agent Loop Initialization
Memex(RL) initializes M = [m0, u], empties the external experience store D, and sets the answer placeholder before any memory operations.
02
ContextStatus and Tool Decisions
At each step, Memex(RL) appends ContextStatus(M, τ), then the policy πagent emits thinking zt and a tool call ct, including CompressExperience, ReadExperience, or Finish.
03
CompressExperience Operation
When ct is CompressExperience, Memex(RL) writes each (index, content) pair into D and rewrites the working context to [m0, u, IndexedSummary] to shrink the prompt.
04
ReadExperience and Finish
When ct is ReadExperience, Memex(RL) dereferences D[index] and appends it to context, and when ct is Finish(y), Memex(RL) returns the final answer y.

KEY CONTRIBUTIONS

Key Contributions

01
Indexed Experience Memory
Memex(RL) formalizes Indexed Experience Memory as an in-context IndexedSummary plus an external experience store D, enabling explicit dereferencing instead of lossy summaries.
02
MemexRL reinforcement learning framework
Memex(RL) introduces a GRPO-style RL framework with context overflow, redundant tool call, and format penalties, and segmented trajectories aligned with CompressExperience boundaries.
03
Theoretical and empirical analysis of Memex loop
Memex(RL) proves that bounded dereferencing can match full-context optimal policies and empirically raises ALFWorld success from 24.22% to 85.61% while cutting peak context by about 6,300 tokens.

RESULTS

By the Numbers

Task success rate

85.61%

+61.39 points over Memex without RL (24.22%)

Peak working context

9,634.47 tokens

-6,299.99 tokens vs Memex without RL (16,934.46)

Context threshold

8,000 tokens

Penalty threshold used during Memex(RL) training

Context window size

32,000 tokens

Total context window available to Memex(RL) during training

On a modified ALFWorld benchmark with hidden admissible commands and truncated summaries, Memex(RL) shows that Indexed Experience Memory can dramatically raise task success while keeping peak working context near the 8,000-token penalty threshold.

BENCHMARK

By the Numbers

BENCHMARK

Effectiveness of MemexRL on Modified ALFWorld

Task success rate (%) for Memex(RL) versus the same Memex agent without RL training.

KEY INSIGHT

The Counterintuitive Finding

Memex(RL) reduces the mean CompressExperience calls per episode from about 6.5 to about 3, while ReadExperience calls rise from about 1 to around 6–7.

This is surprising because one might expect more compression to save context, but Memex(RL) instead learns to compress less often and rely on precise retrieval from Indexed Experience Memory.

WHY IT MATTERS

What this unlocks for the field

Memex(RL) unlocks long-horizon LLM agents that keep a small, indexed working state while still accessing exact historical tool outputs and code snippets on demand.

Builders can now design agents that scale to dozens or hundreds of steps under tight context budgets without sacrificing decision quality or relying solely on lossy summarization.

~12 min read← Back to papers

Related papers

BenchmarkAgent Memory

Active Context Compression: Autonomous Memory Management in LLM Agents

Nikhil Verma

· 2026

Focus Agent adds start_focus, complete_focus, a persistent Knowledge block, and an optimized Persistent Bash plus String-Replace Editor scaffold to actively compress context during long software-engineering tasks. On five hard SWE-bench Lite instances against a Baseline ReAct agent, Focus Agent achieves 22.7% token reduction (14.9M → 11.5M) while matching 3/5 = 60% task success.

arXiv:2601.07190 Read explainer

Agent Memory

ActMem: Bridging the Gap Between Memory Retrieval and Reasoning in LLM Agents

Xiaohui Zhang, Zequn Sun et al.

· 2026

ActMem transforms dialogue history into atomic facts via Memory Fact Extraction, groups them with Fact Clustering, links them through a Memory KG Construction module, and uses Counterfactual-based Retrieval and Reasoning for action-aware answers. On ActMemEval, ActMem reaches 76.52% QA accuracy with DeepSeek-V3, beating LightMem’s 63.97% by 12.55 points and NaiveRAG’s 61.54%.

arXiv:2603.00026 Read explainer

RAGBenchmarkAgent MemoryMemory Architecture

ADAM: A Systematic Data Extraction Attack on Agent Memory via Adaptive Querying

Xingyu Lyu, Jianfeng He et al.

· 2026

ADAM combines Anchor extraction, Distribution estimation, Anchor selection, and Query generation to adaptively probe agent memory via an auxiliary generator and entropy based selection. On the EHRAgent benchmark with Llama2-7b-chat, ADAM reaches EQ=77 and ASR=1.00, compared to MEXTRA’s EQ=44 and ASR=0.89.

arXiv:2604.09747 Read explainer

Questions about this paper?

Answers use this explainer on Memory Papers.

Checking…