MemOS: A Memory OS for AI System

AuthorsZhiyu Li, Chenyang Xi, Chunyu Li, Ding Chen

arXiv 20252025

TL;DR

MemOS uses MemScheduler, MemOperator, MemLifecycle, and MemCube to unify plaintext, activation, and parameter memories into a schedulable hierarchy, enabling memory-centric continual evolution without relying on static context windows.

SharePost on XLinkedIn

Read our summary here, or open the publisher PDF on the next tab.

THE PROBLEM

LLM agents stay stateless and forget long-term context

MemOS notes that existing systems rely on static parameters and short-lived contextual states, limiting long-context reasoning, personalization, and knowledge consistency.

This breaks multi-turn dialogue, planning, and personalization tasks, where agents lack durable memory traces, causing inconsistent behavior and forcing users to repeatedly rebuild context and preferences.

HOW IT WORKS

MemOS: A memory operating system for LLMs

MemOS centers on MemCube, MemScheduler, MemOperator, and MemLifecycle to unify plaintext, activation, and parameter memories under a single Memory API and governance layer.

You can think of MemOS like an OS that manages RAM, disk, and caches, but here it manages heterogeneous memories across agents, users, and tasks.

This unified memory hierarchy lets MemOS schedule, transform, and evolve memories in ways that a plain context window or ad hoc RAG pipeline cannot achieve.

DIAGRAM

MemOS memory retrieval and injection flow

This diagram shows how MemOS routes a single query through MemReader, MemOperator, MemScheduler, and MemCube into the LLM during inference.

DIAGRAM

MemOS evaluation and benchmark coverage

This diagram shows how MemOS is evaluated across PreFEval, PersonaMem, LongMemEval, and LoCoMo against multiple memory baselines.

PROCESS

How MemOS Handles a Memory Lifecycle

  1. 01

    Memory Generation

    MemOS uses MemReader and MemOperator to extract new semantic fragments from interactions and wrap them into MemCube instances with rich metadata.

  2. 02

    Memory Activation

    MemScheduler selects relevant MemCubes and activates them as plaintext, activation, or parameter memories, injecting them into the LLM Core for current tasks.

  3. 03

    Memory Fusion

    MemLifecycle and MemOperator merge overlapping MemCubes, restructure tags and graphs, and support transitions between plaintext, activation, and parameter memories.

  4. 04

    Memory Archiving and Expiration

    MemLifecycle and MemVault archive cold MemCubes, enforce expiry policies, and coordinate MemGovernance for access control and provenance tracking.

KEY CONTRIBUTIONS

Key Contributions

  • 01

    MemOS: A memory operating system

    MemOS formalizes memory as a schedulable system resource, with MemCube, MemScheduler, MemOperator, and MemLifecycle enabling unified lifecycle control across plaintext, activation, and parameter memories.

  • 02

    MemCube unified memory abstraction

    MemOS introduces MemCube as a universal container that encapsulates memory payloads plus metadata for provenance, permissions, and behavioral indicators, enabling cross-type transformation and governance.

  • 03

    Memory-centric Mem-training paradigm

    MemOS proposes Mem-training, where models evolve via structured memory units instead of only parameter updates, enabling distributed agents to exchange memories rather than gradients or full weights.

RESULTS

By the Numbers

Personalized response rate

MemOS ranks 1st

ahead of MIRIX, Mem0, Zep, Memobase, MemU, Supermemory on PreFEval 0 and 10 turns

PersonaMem precision

MemOS ranks 1st

vs MIRIX, Mem0, Zep, Memobase, MemU, Supermemory

LongMemEval mean score

MemOS ranks 1st

higher overall mean score than all listed memory baselines

LoCoMo LLM judge score

MemOS ranks 1st

top overall mean LLM judge score among all compared systems

MemOS is evaluated on PreFEval, PersonaMem, LongMemEval, and LoCoMo, which test personalization, persona precision, long-context reasoning, and conversational coherence. The consistent first-place rankings across all four benchmarks show that MemOS's memory operating system design translates into strong empirical performance over MIRIX, Mem0, Zep, Memobase, MemU, and Supermemory.

BENCHMARK

By the Numbers

MemOS is evaluated on PreFEval, PersonaMem, LongMemEval, and LoCoMo, which test personalization, persona precision, long-context reasoning, and conversational coherence. The consistent first-place rankings across all four benchmarks show that MemOS's memory operating system design translates into strong empirical performance over MIRIX, Mem0, Zep, Memobase, MemU, and Supermemory.

BENCHMARK

Overall benchmark ranking summary from Figure 1

Relative ranking of MemOS and baselines across PreFEval, PersonaMem, LongMemEval, and LoCoMo (qualitative first-place vs others).

KEY INSIGHT

The Counterintuitive Finding

MemOS shows that treating memory as a schedulable OS-level resource, rather than just adding RAG, can dominate all four major memory benchmarks simultaneously.

This challenges the common assumption that better retrievers or longer context windows alone are enough, highlighting the importance of lifecycle and governance for effective LLM memory.

WHY IT MATTERS

What this unlocks for the field

MemOS unlocks agents that maintain coherent identities, preferences, and knowledge over long horizons through MemCube-based memory evolution and MemScheduler-driven activation.

Builders can now design cross-session, cross-platform, and multi-agent systems where memories migrate, fuse, and expire under explicit policies, instead of being trapped in opaque context windows or siloed RAG stores.

~14 min read← Back to papers

Related papers

Memory Architecture

A Control Architecture for Training-Free Memory Use

Yanzhen Lu, Muchen Jiang et al.

· 2026

TAG routes low-confidence steps to uncertainty-based routing, filters them with guarded acceptance with rollback, chooses between bank selection across rule and exemplar memory, and prunes via evidence-based retirement inside a unified control loop. On SVAMP and ASDiv, TAG reaches 81.0% and 85.2% accuracy, improving over the 74.0% and 77.5% no-memory baselines while a compute-matched Retry baseline stays flat.

Questions about this paper?

Paper: MemOS: A Memory OS for AI System

Answers use this explainer on Memory Papers.

Checking…