Experience Compression Spectrum: Unifying Memory, Skills, and Rules in LLM Agents

AuthorsXing Zhang, Guanghui Wang, Yanwei Cui et al.

2026

TL;DR

Experience Compression Spectrum reframes agent memory, skills, and rules as one compression axis, explaining 5–20× to 1,000×+ context savings and exposing the missing diagonal in current systems.

SharePost on XLinkedIn

Read our summary here, or open the publisher PDF on the next tab.

THE PROBLEM

Agents Learn From Experience But Cross Citation Is Below 1 Percent

Experience Compression Spectrum shows that across 1,136 references in 22 primary papers, cross community citation is below 1%, indicating deep fragmentation.

Memory papers cite skill work at only 0.7% and skill papers cite memory work at 1.2%, so agents duplicate solutions and stall scalable experience management.

Without a unified view, long horizon agents either hoard low level episodic memories or over abstract skills, exhausting retrieval budgets or losing task specificity.

HOW IT WORKS

Experience Compression Spectrum — Four Levels of Agent Knowledge

Experience Compression Spectrum defines four compression levels using Level 0 Raw Trace, Level 1 Episodic Memory, Level 2 Procedural Skill, and Level 3 Declarative Rule as scaffold level knowledge outputs.

You can think of Experience Compression Spectrum like a memory hierarchy in computing, where raw traces are RAM, episodic memories are cache, skills are disk, and rules are compressed indexes.

Experience Compression Spectrum enables reasoning about when to store detailed episodes versus compact skills or abstract rules, something a plain context window cannot control or adapt across deployments.

DIAGRAM

Knowledge Flow Across Compression Levels

This diagram shows how Experience Compression Spectrum conceptualizes upward and downward knowledge movement between raw traces, episodic memories, skills, and rules.

DIAGRAM

Evaluation and Evidence Aggregation Pipeline

This diagram shows how Experience Compression Spectrum aggregates evidence from memory and skill systems to derive structural insights and testable predictions.

PROCESS

How Experience Compression Spectrum Handles Long Horizon Agent Experience

  1. 01

    Interaction Trace Definition

    Experience Compression Spectrum formalizes an interaction trace T as sequences of states, actions, observations, and feedback, grounding all later compression levels.

  2. 02

    Experience Compression Function

    Experience Compression Spectrum defines CL as a function mapping traces into knowledge artifacts at Level 0 Raw Trace, Level 1 Episodic Memory, Level 2 Procedural Skill, or Level 3 Declarative Rule.

  3. 03

    Mapping Existing Systems

    Experience Compression Spectrum maps more than 20 systems like Mem0, Voyager, and Trace2Skill onto specific levels, revealing clustering at Level 1 and Level 2.

  4. 04

    The Missing Diagonal

    Experience Compression Spectrum identifies the missing diagonal where no system adaptively selects levels or promotes and demotes knowledge across Level 1, Level 2, and Level 3.

KEY CONTRIBUTIONS

Key Contributions

  • 01

    Formalizing the Experience Compression Spectrum

    Experience Compression Spectrum unifies Level 0 Raw Trace, Level 1 Episodic Memory, Level 2 Procedural Skill, and Level 3 Declarative Rule on a single compression axis with 5–20×, 50–500×, and 1,000×+ ratios.

  • 02

    Mapping 20 Plus Systems and Exposing the Missing Diagonal

    Experience Compression Spectrum maps 20+ agent learning systems and shows every system fixes a compression level, with no adaptive cross level compression between memory, skills, and rules.

  • 03

    Revealing Structural Insights and Open Problems

    Experience Compression Spectrum derives four structural insights, highlights <1% cross citation, and articulates open problems like adaptive level selection and principled lifecycle management.

RESULTS

By the Numbers

Cross community citation rate

below 1%

Memory papers 0.7% vs skill papers 1.2%

Episodic memory compression

5–20×

context savings over Level 0 Raw Trace

Procedural skill compression

50–500×

higher compression than Level 1 Episodic Memory

Declarative rule compression

1000×+

highest compression but lowest specificity

Experience Compression Spectrum aggregates evidence from systems like Mem0 and Trace2Skill, quantifying compression from raw traces to rules and cross community citation below 1% across 1,136 references. This proves Experience Compression Spectrum captures a real structural gap rather than a purely conceptual taxonomy.

BENCHMARK

By the Numbers

Experience Compression Spectrum aggregates evidence from systems like Mem0 and Trace2Skill, quantifying compression from raw traces to rules and cross community citation below 1% across 1,136 references. This proves Experience Compression Spectrum captures a real structural gap rather than a purely conceptual taxonomy.

BENCHMARK

Compression Ratios Across the Experience Compression Spectrum

Approximate compression ratios for different knowledge levels in Experience Compression Spectrum.

KEY INSIGHT

The Counterintuitive Finding

Experience Compression Spectrum reports that curated Level 2 skills can add +16.2pp, while LLM self generated skills add +0.0pp on SkillsBench.

This is surprising because many assume more skills always help, but Experience Compression Spectrum shows compression quality matters more than simply having compact artifacts.

WHY IT MATTERS

What this unlocks for the field

Experience Compression Spectrum gives builders a vocabulary and framework to design agents that store memories, skills, and rules as deliberate compression choices.

With Experience Compression Spectrum, developers can plan future systems that adaptively promote or demote knowledge across levels instead of freezing agents at a single compression granularity.

~14 min read← Back to papers

Related papers

Agent MemoryLong-Term Memory

Agentic Memory: Learning Unified Long-Term and Short-Term Memory Management for Large Language Model Agents

Yi Yu, Liuyi Yao et al.

arXiv 2026 · 2026

Agentic Memory (AgeMem) exposes memory management tools, a three-stage progressive RL strategy, and step-wise GRPO directly inside the agent policy to jointly control long-term and short-term memory. On Qwen3-4B-Instruct, AgeMem attains 54.31% average performance across ALFWorld, SciWorld, PDDL, BabyAI, and HotpotQA, exceeding the best baseline A-Mem at 45.74%.

Agent Memory

AMemGym: Interactive Memory Benchmarking for Assistants in Long-Horizon Conversations

Cheng Jiayang, Dongyu Ru et al.

· 2026

AMemGym combines Structured Data Generation, On-Policy Interaction, Evaluation Metrics, and Meta-Evaluation to script user state trajectories, drive LLM-simulated role-play, and score write–read–utilization behavior. On AMemGym’s base configuration, AWE-(2,4,30) reaches a 0.291 normalized memory score on interactive evaluation, while native gpt-4.1-mini only achieves 0.203, exposing substantial gaps between memory agents and plain long-context LLMs.

Agent Memory

AMV-L: Lifecycle-Managed Agent Memory for Tail-Latency Control in Long-Running LLM Systems

Emmanuel Bamidele

· 2026

AMV-L manages agent memory using a Memory Value Model, Tiered Lifecycle, Bounded Retrieval Path, and Lifecycle Manager to decouple retention from retrieval eligibility. Under a 70k-request long-running workload, AMV-L improves throughput from 9.027 to 36.977 req/s over TTL and reduces p99 latency from 5398.167 ms to 1233.430 ms while matching LRU’s retrieval quality.

Questions about this paper?

Paper: Experience Compression Spectrum: Unifying Memory, Skills, and Rules in LLM Agents

Answers use this explainer on Memory Papers.

Checking…