MemOS: An Operating System for Memory-Augmented Generation (MAG) in Large Language Models

AuthorsZhiyu Li, Shichao Song, Hanyu Wang et al.

2025

TL;DR

MemOS uses the MemCube abstraction plus MemScheduler and MemGovernance to turn LLM memory into a first-class OS resource, unifying parametric, activation, and plaintext memory without new benchmarks.

SharePost on X LinkedIn

Read our summary here, or open the publisher PDF on the next tab.

THE PROBLEM

LLMs lack unified memory architecture and create long term “memory silos”

MemOS highlights that current LLMs “fundamentally lack a unified and structured architecture for handling memory,” causing fragmented parametric, activation, and plaintext memory.

MemOS shows that this fragmentation breaks long term conversational state, user preference persistence, and multi agent workflows, leading to cross platform memory silos and poor knowledge evolution.

HOW IT WORKS

MemOS — MemCube plus OS style scheduling and governance

MemOS centers MemCube, MemScheduler, MemLifecycle, MemOperator, MemVault, and MemGovernance to treat memory as a schedulable, governed resource across parametric, activation, and plaintext forms.

You can think of MemOS like an operating system where MemCube is a process, MemVault is disk, and MemScheduler is the CPU scheduler coordinating memory access.

This OS style design lets MemOS perform cross type memory transformations and lifecycle control that a plain context window or ad hoc RAG pipeline cannot achieve.

DIAGRAM

MemOS memory I O path from user prompt to governed storage

This diagram shows how MemOS routes a prompt through MemReader, MemScheduler, and MemOperator, then persists MemCube units via MemVault and MemGovernance.

DIAGRAM

Three layer MemOS architecture for memory governance

This diagram shows how MemOS organizes Interface, Operation, and Infrastructure layers around MemCube to manage the full memory lifecycle.

PROCESS

How MemOS Handles a Memory I O Path

01
Interface Layer Memory API and Pipeline
MemOS uses MemReader and the Memory API to parse natural language into structured MemCube based operation chains that define provenance, updates, and log queries.
02
Operation Layer Memory Scheduling and Lifecycle Management
MemScheduler, MemOperator, and MemLifecycle coordinate which MemCube units to load, how to organize them, and when to version, rollback, or freeze them.
03
Infrastructure Layer Governance and Memory Store
MemVault, MemGovernance, and MemStore enforce access control, expiry policies, and cross platform circulation for MemCube units across agents and applications.
04
Closed loop Memory I O Path
MemOS routes prompts through scheduling, injection into LLM inference, and back into MemVault, forming a reusable, traceable MemCube lifecycle across sessions.

KEY CONTRIBUTIONS

Key Contributions

01
Memory Cube as unified abstraction
MemOS introduces MemCube to encapsulate parametric, activation, and plaintext memory with descriptive, governance, and behavioral metadata for cross type scheduling and evolution.
02
Three layer MemOS architecture
MemOS defines Interface, Operation, and Infrastructure layers with MemScheduler, MemLifecycle, MemOperator, MemVault, MemGovernance, and MemStore to manage full memory lifecycles.
03
Memory Interchange Protocol and marketplace
MemOS proposes a Memory Interchange Protocol and MemStore to support cross LLM memory sharing, self evolving MemBlocks, and a decentralized memory marketplace.

RESULTS

By the Numbers

Memory types unified

3 types

Parametric, activation, and plaintext unified in MemOS

MemCube metadata fields

11 fields

Created, last used, source, model, usage, priority, expires, access, tags, embedding fp, storage mode

MemOS architecture layers

3 layers

Interface, Operation, and Infrastructure layers in MemOS design

Future directions

3 goals

Cross LLM sharing, self evolving MemBlocks, scalable memory marketplace

MemOS is a systems and architecture paper without benchmark tables, so the key quantitative facts describe how many memory types, metadata fields, and architectural layers MemOS unifies. These design numbers show that MemOS systematically covers the full memory lifecycle rather than optimizing a single metric.

BENCHMARK

By the Numbers

BENCHMARK

Memory types integrated by MemOS

Relative emphasis of parametric, activation, and plaintext memory in the MemOS design.

KEY INSIGHT

The Counterintuitive Finding

MemOS argues that simply scaling pre training and post training hits diminishing returns, and that a new “mem training” scaling law is needed.

This is counterintuitive because most builders assume larger models and better fine tuning are enough, but MemOS claims structured memory governance is the real next frontier.

WHY IT MATTERS

What this unlocks for the field

MemOS unlocks LLMs that can accumulate experience, retain user specific preferences, and evolve behavior over time through MemCube based lifecycle control.

With MemOS, builders can design agents that share governed memory across users, platforms, and models instead of relying on brittle prompts or siloed RAG stores.

~10 min read← Back to papers

Related papers

BenchmarkAgent Memory

Active Context Compression: Autonomous Memory Management in LLM Agents

Nikhil Verma

· 2026

Focus Agent adds start_focus, complete_focus, a persistent Knowledge block, and an optimized Persistent Bash plus String-Replace Editor scaffold to actively compress context during long software-engineering tasks. On five hard SWE-bench Lite instances against a Baseline ReAct agent, Focus Agent achieves 22.7% token reduction (14.9M → 11.5M) while matching 3/5 = 60% task success.

arXiv:2601.07190 Read explainer

RAGBenchmarkAgent MemoryMemory Architecture

ADAM: A Systematic Data Extraction Attack on Agent Memory via Adaptive Querying

Xingyu Lyu, Jianfeng He et al.

· 2026

ADAM combines Anchor extraction, Distribution estimation, Anchor selection, and Query generation to adaptively probe agent memory via an auxiliary generator and entropy based selection. On the EHRAgent benchmark with Llama2-7b-chat, ADAM reaches EQ=77 and ASR=1.00, compared to MEXTRA’s EQ=44 and ASR=0.89.

arXiv:2604.09747 Read explainer

BenchmarkMemory Architecture

AdaMem: Adaptive User-Centric Memory for Long-Horizon Dialogue Agents

Shannan Yan, Jingchen Ni et al.

· 2026

AdaMem organizes dialogue history into Working Memory, Episodic Memory, Persona Memory, and Graph Memory coordinated by a Memory Agent, Research Agent, and Working Agent. On LoCoMo with GPT-4.1-mini, AdaMem achieves 44.65 F1 overall, beating the best baseline LangMem at 41.76 F1 by +2.89.

arXiv:2603.16496 Read explainer

Questions about this paper?

Answers use this explainer on Memory Papers.

Checking…