MemOS: A Memory OS for AI System

AuthorsZhiyu Li, Chenyang Xi, Chunyu Li, Ding Chen

arXiv 20252025

TL;DR

MemOS uses MemScheduler, MemOperator, MemLifecycle, and MemCube to unify plaintext, activation, and parameter memories into a schedulable hierarchy, enabling memory-centric continual evolution without relying on static context windows.

SharePost on X LinkedIn

Read our summary here, or open the publisher PDF on the next tab.

THE PROBLEM

LLM agents stay stateless and forget long-term context

MemOS notes that existing systems rely on static parameters and short-lived contextual states, limiting long-context reasoning, personalization, and knowledge consistency.

This breaks multi-turn dialogue, planning, and personalization tasks, where agents lack durable memory traces, causing inconsistent behavior and forcing users to repeatedly rebuild context and preferences.

HOW IT WORKS

MemOS: A memory operating system for LLMs

MemOS centers on MemCube, MemScheduler, MemOperator, and MemLifecycle to unify plaintext, activation, and parameter memories under a single Memory API and governance layer.

You can think of MemOS like an OS that manages RAM, disk, and caches, but here it manages heterogeneous memories across agents, users, and tasks.

This unified memory hierarchy lets MemOS schedule, transform, and evolve memories in ways that a plain context window or ad hoc RAG pipeline cannot achieve.

DIAGRAM

MemOS memory retrieval and injection flow

This diagram shows how MemOS routes a single query through MemReader, MemOperator, MemScheduler, and MemCube into the LLM during inference.

DIAGRAM

MemOS evaluation and benchmark coverage

This diagram shows how MemOS is evaluated across PreFEval, PersonaMem, LongMemEval, and LoCoMo against multiple memory baselines.

PROCESS

How MemOS Handles a Memory Lifecycle

01
Memory Generation
MemOS uses MemReader and MemOperator to extract new semantic fragments from interactions and wrap them into MemCube instances with rich metadata.
02
Memory Activation
MemScheduler selects relevant MemCubes and activates them as plaintext, activation, or parameter memories, injecting them into the LLM Core for current tasks.
03
Memory Fusion
MemLifecycle and MemOperator merge overlapping MemCubes, restructure tags and graphs, and support transitions between plaintext, activation, and parameter memories.
04
Memory Archiving and Expiration
MemLifecycle and MemVault archive cold MemCubes, enforce expiry policies, and coordinate MemGovernance for access control and provenance tracking.

KEY CONTRIBUTIONS

Key Contributions

01
MemOS: A memory operating system
MemOS formalizes memory as a schedulable system resource, with MemCube, MemScheduler, MemOperator, and MemLifecycle enabling unified lifecycle control across plaintext, activation, and parameter memories.
02
MemCube unified memory abstraction
MemOS introduces MemCube as a universal container that encapsulates memory payloads plus metadata for provenance, permissions, and behavioral indicators, enabling cross-type transformation and governance.
03
Memory-centric Mem-training paradigm
MemOS proposes Mem-training, where models evolve via structured memory units instead of only parameter updates, enabling distributed agents to exchange memories rather than gradients or full weights.

RESULTS

By the Numbers

Personalized response rate

MemOS ranks 1st

ahead of MIRIX, Mem0, Zep, Memobase, MemU, Supermemory on PreFEval 0 and 10 turns

PersonaMem precision

MemOS ranks 1st

vs MIRIX, Mem0, Zep, Memobase, MemU, Supermemory

LongMemEval mean score

MemOS ranks 1st

higher overall mean score than all listed memory baselines

LoCoMo LLM judge score

MemOS ranks 1st

top overall mean LLM judge score among all compared systems

MemOS is evaluated on PreFEval, PersonaMem, LongMemEval, and LoCoMo, which test personalization, persona precision, long-context reasoning, and conversational coherence. The consistent first-place rankings across all four benchmarks show that MemOS's memory operating system design translates into strong empirical performance over MIRIX, Mem0, Zep, Memobase, MemU, and Supermemory.

BENCHMARK

By the Numbers

BENCHMARK

Overall benchmark ranking summary from Figure 1

Relative ranking of MemOS and baselines across PreFEval, PersonaMem, LongMemEval, and LoCoMo (qualitative first-place vs others).

KEY INSIGHT

The Counterintuitive Finding

MemOS shows that treating memory as a schedulable OS-level resource, rather than just adding RAG, can dominate all four major memory benchmarks simultaneously.

This challenges the common assumption that better retrievers or longer context windows alone are enough, highlighting the importance of lifecycle and governance for effective LLM memory.

WHY IT MATTERS

What this unlocks for the field

MemOS unlocks agents that maintain coherent identities, preferences, and knowledge over long horizons through MemCube-based memory evolution and MemScheduler-driven activation.

Builders can now design cross-session, cross-platform, and multi-agent systems where memories migrate, fuse, and expire under explicit policies, instead of being trapped in opaque context windows or siloed RAG stores.

~14 min read← Back to papers

Related papers

Memory Architecture

A Control Architecture for Training-Free Memory Use

Yanzhen Lu, Muchen Jiang et al.

· 2026

TAG routes low-confidence steps to uncertainty-based routing, filters them with guarded acceptance with rollback, chooses between bank selection across rule and exemplar memory, and prunes via evidence-based retirement inside a unified control loop. On SVAMP and ASDiv, TAG reaches 81.0% and 85.2% accuracy, improving over the 74.0% and 77.5% no-memory baselines while a compute-matched Retry baseline stays flat.

arXiv:2604.18206 Read explainer

RAGBenchmarkAgent MemoryMemory Architecture

ADAM: A Systematic Data Extraction Attack on Agent Memory via Adaptive Querying

Xingyu Lyu, Jianfeng He et al.

· 2026

ADAM combines Anchor extraction, Distribution estimation, Anchor selection, and Query generation to adaptively probe agent memory via an auxiliary generator and entropy based selection. On the EHRAgent benchmark with Llama2-7b-chat, ADAM reaches EQ=77 and ASR=1.00, compared to MEXTRA’s EQ=44 and ASR=0.89.

arXiv:2604.09747 Read explainer

BenchmarkMemory Architecture

AdaMem: Adaptive User-Centric Memory for Long-Horizon Dialogue Agents

Shannan Yan, Jingchen Ni et al.

· 2026

AdaMem organizes dialogue history into Working Memory, Episodic Memory, Persona Memory, and Graph Memory coordinated by a Memory Agent, Research Agent, and Working Agent. On LoCoMo with GPT-4.1-mini, AdaMem achieves 44.65 F1 overall, beating the best baseline LangMem at 41.76 F1 by +2.89.

arXiv:2603.16496 Read explainer

Questions about this paper?

Answers use this explainer on Memory Papers.

Checking…