MemOS: A Memory OS for AI System

AuthorsZhiyu Li, Chenyang Xi, Chunyu Li, Ding Chen

arXiv 20252025

TL;DR

MemOS uses unified MemCube-based scheduling across plaintext, activation, and parameter memories to treat memory as an OS resource, achieving first-place scores on PreFEval, PersonaMem, LongMemEval, and LoCoMo against MIRIX, Mem0, Zep, Memobase, MemU, and Supermemory.

THE PROBLEM

LLM agents lack schedulable memory and remain stateless over long horizons

MemOS targets LLMs that rely on static parameters and short-lived context, where RAG is only a stateless workaround without lifecycle control or persistent representations.

This breaks long-context reasoning, continual personalization, and knowledge consistency, so agents cannot maintain behavioral continuity or manage evolving knowledge across tasks, users, and platforms.

HOW IT WORKS

MemOS — a memory operating system with MemCubes and hierarchical scheduling

MemOS introduces MemReader, MemScheduler, MemLifecycle, MemOperator, and MemGovernance around MemCube units to unify plaintext, activation, and parameter memories as schedulable resources.

You can think of MemOS like an OS with RAM, disk, and caches, where MemScheduler and MemVault play the roles of a CPU scheduler and file system for memory.

This OS-style design lets MemOS perform lifecycle-aware retrieval, fusion, and evolution that a plain context window or vanilla RAG pipeline cannot express or control.

DIAGRAM

Memory taxonomy in MemOS across implicit and explicit hierarchies

This diagram shows how MemOS structures plaintext memory, activation memory, and parameter memory within a hierarchical memory space.

DIAGRAM

Evaluation pipeline for MemOS across long-term memory benchmarks

This diagram shows how MemOS is evaluated from benchmark selection through memory configuration to metric aggregation.

PROCESS

How MemOS Handles a Memory Lifecycle — generation, activation, fusion, archiving

  1. 01

    Generation

    MemOS uses MemReader and MemOperator to extract new MemCubes from interactions, tagging them with provenance and semantic type for later scheduling.

  2. 02

    Activation

    MemScheduler selects relevant MemCubes, loading them as plaintext memory, activation memory, or parameter memory depending on task context and access patterns.

  3. 03

    Fusion

    MemLifecycle coordinates MemOperator to merge overlapping MemCubes, performing memory slicing, tagging, and hierarchical mapping to restructure knowledge.

  4. 04

    Archiving

    MemLifecycle and MemGovernance move cooled MemCubes into MemVault, enforcing expiry policy, permission control, and versioning for long-term storage.

KEY CONTRIBUTIONS

Key Contributions

  • 01

    MemOS as a memory operating system

    MemOS treats memory as a first-class system resource, with MemScheduler, MemLifecycle, and MemOperator orchestrating generation, activation, fusion, archiving, and expiration of MemCubes.

  • 02

    MemCube unified memory abstraction

    MemOS introduces MemCube to encapsulate plaintext memory, activation memory, and parameter memory together with metadata for provenance, permissions, and behavioral indicators.

  • 03

    Mem training paradigm and governance

    MemOS enables a Mem training paradigm where heterogeneous agents exchange MemCubes instead of parameters, with MemGovernance enforcing access control, auditing, and lifecycle policies.

RESULTS

By the Numbers

Personalized response rate

MemOS-1031 ranks 1st on PreFEval 0 turns

ahead of MIRIX, Mem0, Zep, Memobase, MemU, Supermemory

Personalized response rate

MemOS-1031 ranks 1st on PreFEval 10 turns

vs MIRIX, Mem0, Zep, Memobase, MemU, Supermemory

Precision score

MemOS-1031 ranks 1st on PersonaMem

higher precision than MIRIX, Mem0, Zep, Memobase, MemU, Supermemory

Overall mean score

MemOS-1031 ranks 1st on LongMemEval and LoCoMo

top LLM judge score among MIRIX, Mem0, Zep, Memobase, MemU, Supermemory

Across PreFEval, PersonaMem, LongMemEval, and LoCoMo, which test personalization and long-term memory, MemOS-1031 consistently ranks first, demonstrating that MemOS’s OS-style memory governance yields superior long-horizon behavior.

BENCHMARK

By the Numbers

Across PreFEval, PersonaMem, LongMemEval, and LoCoMo, which test personalization and long-term memory, MemOS-1031 consistently ranks first, demonstrating that MemOS’s OS-style memory governance yields superior long-horizon behavior.

BENCHMARK

Benchmark: MemOS achieves state-of-the-art performance across all benchmarks

Personalized response rate and mean scores on PreFEval, PersonaMem, LongMemEval, and LoCoMo.

KEY INSIGHT

The Counterintuitive Finding

MemOS shows that adding a full memory operating system, not just a retrieval module, can improve both efficiency and long-term reasoning quality simultaneously.

This is surprising because many assume extra memory layers only add overhead, yet MemOS’s MemScheduler and MemLifecycle demonstrate that structured governance can reduce cost while improving behavior.

WHY IT MATTERS

What this unlocks for the field

MemOS unlocks agents that can carry structured memories across sessions, roles, and platforms while keeping provenance, permissions, and evolution under explicit control.

Builders can now design ecosystems where many heterogeneous agents share MemCubes via MemStore and MemVault, enabling continual learning and cross-platform personalization that was previously ad hoc or impossible.

~16 min read← Back to papers

Related papers

Memory ArchitectureSurvey

Multi-Agent Memory from a Computer Architecture Perspective: Visions and Challenges Ahead

Zhongming Yu, Naicheng Yu et al.

arXiv 2026 · 2026

Multi-Agent Memory Architecture organizes **Agent IO Layer**, **Agent Cache Layer**, and **Agent Memory Layer** plus **Agent Cache Sharing** and **Agent Memory Access** protocols into a unified architectural framing for multi-agent systems. The position-only SYS_NAME proposes no benchmark MAIN_RESULT or numeric comparison against any baseline.

RAGMemory ArchitectureLong-Term Memory

From RAG to Memory: Non-Parametric Continual Learning for Large Language Models

Bernal Jiménez Gutiérrez, Yiheng Shu et al.

ICML 2025 · 2025

HippoRAG 2 combines **Offline Indexing**, a schema-less **Knowledge Graph**, **Dense-Sparse Integration**, **Deeper Contextualization**, and **Recognition Memory** into a neuro-inspired non-parametric memory system for LLMs. On the joint RAG benchmark suite, HippoRAG 2 achieves 59.8 average F1 versus 57.0 for NV-Embed-v2, including 71.0 F1 on 2Wiki compared to 61.5 for NV-Embed-v2.

Agent MemoryMemory Architecture

General Agentic Memory Via Deep Research

B.Y. Yan, Chaofan Li et al.

arXiv 2025 · 2025

General Agentic Memory (GAM) combines a **Memorizer**, **Researcher**, **page-store**, and **memory** to keep full trajectories while constructing lightweight guidance for deep research. On RULER 128K retrieval, GAM achieves 97.70% accuracy compared to 94.25% for RAG using GPT-4o-mini, while also reaching 64.07 F1 on HotpotQA-56K.