Long Term Memory: The Foundation of AI Self-Evolution

AuthorsXun Jiang, Feng Li, Han Zhao et al.

2024

TL;DR

Long Term Memory + OMNE uses multi-agent LTM construction and utilization to enable AI self-evolution, achieving first place on the GAIA benchmark.

SharePost on X LinkedIn

Read our summary here, or open the publisher PDF on the next tab.

THE PROBLEM

Foundation models average away individuals and block self-evolution

Long Term Memory notes that current foundation models compress vast data into an average model, which struggles with long-tail individual data and rare scenarios.

This limitation prevents OMNE from expressing personalized information, hindering AI self-evolution and multi-agent collaboration in complex, dynamic environments.

HOW IT WORKS

Long-Term Memory for AI Self-Evolution

Long Term Memory introduces a Data Framework for LTM, Development Framework for LTM, and multi-agent collaboration mechanism to store and use refined individual interaction data.

An analogy is a human brain with cortical columns: each agent builds its own world model, while a shared long-term memory acts like a collective hippocampus and card catalog.

This KEY_MECHANISM lets Long Term Memory and OMNE represent sparse, evolving personal experiences beyond a fixed context window, enabling continual adaptation without retraining all parameters.

DIAGRAM

AI Self-Evolution Lifecycle with Long Term Memory

This diagram shows how Long Term Memory drives AI self-evolution across three phases of model evolution.

DIAGRAM

LTM Data Pipeline for Office and Health Scenarios

This diagram shows how Long Term Memory collects, structures, and deploys data in office collaboration and health management systems.

PROCESS

How Long Term Memory Handles AI Self-Evolution

01
Data collection framework
Long Term Memory uses the Data Framework for LTM to gather fragmented digital and physical interaction data in office collaboration and health management.
02
Data synthesis techniques
Long Term Memory applies mixed synthetic strategies like role playing, EnvGen, AGA, and RTG synthesis to enrich sparse individual data.
03
Construction strategies of LTM
Long Term Memory transforms raw data into LTM via text summarization, data structuring, graph representation, vectorization, and model parameterization.
04
Development Framework for LTM
Long Term Memory powers OMNE, where each agent builds an independent world model over LTM and collaborates to adapt plans and decisions in real time.

KEY CONTRIBUTIONS

Key Contributions

01
Definitions of AI Self-Evolution and LTM
Long Term Memory formalizes AI self-evolution and LTM, linking Differentiated Personalized Models and Long-Term Memory and Learning Ability as the core of personalized agents.
02
Data Framework for LTM
Long Term Memory proposes a data collection, analysis, and synthesis framework deployed in office collaboration and health management, including the world’s largest real-user voice dataset for mental health.
03
Development Framework for LTM
Long Term Memory introduces the multi-agent OMNE framework, where each agent maintains an independent world model over LTM, achieving first place on the GAIA benchmark.

RESULTS

By the Numbers

GAIA benchmark rank

1st place

Top position over all reported GAIA baselines

Voice dataset scale

world’s largest

Largest real user mental health voice dataset reported by Long Term Memory

Business deployments

2 scenarios

Office collaboration and health management systems

Agent framework

multi agent

OMNE agents all equipped with independent LTM world models

On the GAIA benchmark, which tests complex real-world problem solving, Long Term Memory with OMNE reaches first place, showing that LTM-driven multi-agent personalization can handle diverse tasks. The large real-user mental health voice dataset and dual business deployments demonstrate that Long Term Memory scales beyond benchmarks into practical environments.

BENCHMARK

GAIA benchmark performance comparison

Relative GAIA benchmark ranking for OMNE vs generic agent baselines (higher rank is better).

KEY INSIGHT

The Counterintuitive Finding

Long Term Memory shows that self-evolution can emerge from limited, refined interaction data rather than ever larger pre-training corpora.

This challenges the common belief that scaling datasets alone is the main path to stronger AI, highlighting architecture and LTM as equally critical levers.

WHY IT MATTERS

What this unlocks for the field

Long Term Memory unlocks AI systems where thousands of OMNE agents evolve distinct world models over shared LTM while still sharing a core architecture.

Builders can now design ecosystems of personalized, collaborating agents that adapt over months of interaction, instead of repeatedly retraining monolithic foundation models.

~14 min read← Back to papers

Related papers

BenchmarkAgent Memory

Active Context Compression: Autonomous Memory Management in LLM Agents

Nikhil Verma

· 2026

Focus Agent adds start_focus, complete_focus, a persistent Knowledge block, and an optimized Persistent Bash plus String-Replace Editor scaffold to actively compress context during long software-engineering tasks. On five hard SWE-bench Lite instances against a Baseline ReAct agent, Focus Agent achieves 22.7% token reduction (14.9M → 11.5M) while matching 3/5 = 60% task success.

arXiv:2601.07190 Read explainer

RAGBenchmarkAgent MemoryMemory Architecture

ADAM: A Systematic Data Extraction Attack on Agent Memory via Adaptive Querying

Xingyu Lyu, Jianfeng He et al.

· 2026

ADAM combines Anchor extraction, Distribution estimation, Anchor selection, and Query generation to adaptively probe agent memory via an auxiliary generator and entropy based selection. On the EHRAgent benchmark with Llama2-7b-chat, ADAM reaches EQ=77 and ASR=1.00, compared to MEXTRA’s EQ=44 and ASR=0.89.

arXiv:2604.09747 Read explainer

BenchmarkMemory Architecture

AdaMem: Adaptive User-Centric Memory for Long-Horizon Dialogue Agents

Shannan Yan, Jingchen Ni et al.

· 2026

AdaMem organizes dialogue history into Working Memory, Episodic Memory, Persona Memory, and Graph Memory coordinated by a Memory Agent, Research Agent, and Working Agent. On LoCoMo with GPT-4.1-mini, AdaMem achieves 44.65 F1 overall, beating the best baseline LangMem at 41.76 F1 by +2.89.

arXiv:2603.16496 Read explainer

Questions about this paper?

Answers use this explainer on Memory Papers.

Checking…