Benchmark Benchmark Agent Memory Memory Architecture

Forgetful but Faithful: A Cognitive Memory Architecture and Benchmark for Privacy-Aware Generative Agents

AuthorsSaad Alqithami

2025

TL;DR

MaRS couples typed, provenance-aware memory with hybrid forgetting policies so Hybrid retention reaches ≈0.911 composite FiFA score under tight budgets.

SharePost on X LinkedIn

Read our summary here, or open the publisher PDF on the next tab.

THE PROBLEM

Unbounded agent memory hurts coherence and privacy (Hybrid policy ≈0.911 composite shows the tradeoff)

Generative agents accumulate long interaction histories, and naive retention inflates context, harms retrieval, and increases privacy risk despite Hybrid reaching ≈0.911 composite FiFA score.

Without structured forgetting, MaRS style agents either overspend tokens or lose narrative coherence and social recall, undermining long-horizon goals and privacy expectations.

HOW IT WORKS

Memory-Aware Retention Schema MaRS

MaRS introduces episodic, semantic, social, and task memory types plus a privacy engine and forgetting policies like FIFO, LRU, Priority Decay, Reflection-Summary, Random-Drop, and Hybrid.

Think of MaRS as a governed RAM plus disk for agents, where typed memories are files, indices are the card catalog, and forgetting policies are an OS eviction scheduler.

This design lets MaRS keep high-utility, low-sensitivity memories while compressing or deleting others, something a plain context window cannot express or audit.

DIAGRAM

FiFA Interaction and Memory Flow

This diagram shows how MaRS handles a FiFA episode from user interaction through memory storage, retrieval, and response generation.

DIAGRAM

FiFA Evaluation Pipeline for MaRS

This diagram shows how the FiFA benchmark runs 300 simulations across five budgets to compare MaRS forgetting policies.

PROCESS

How MaRS Handles a FiFA Simulation Episode

01
Memory-Aware Retention Schema MaRS
MaRS initializes typed episodic, semantic, social, and task stores plus indices and a privacy engine before FiFA simulations begin.
02
Forgetting Policy Framework
MaRS activates FIFO, LRU, Priority Decay, Reflection-Summary, Random-Drop, or Hybrid policies when token budgets are exceeded during interaction.
03
Privacy-Aware Policies
MaRS applies sensitivity scores and optional (ε, δ)-differential privacy at the retention boundary to govern which memories are summarized or deleted.
04
FiFA Benchmark Evaluation
MaRS runs 300 FiFA simulations across five memory budgets, logging narrative coherence, goal completion, social recall, privacy leakage, and cost efficiency.

KEY CONTRIBUTIONS

Key Contributions

01
Memory-Aware Retention Schema MaRS
MaRS defines a typed, provenance-aware memory graph with episodic, semantic, social, and task nodes plus indices and budgets, turning retention into a policy-addressable decision surface.
02
Forgetting Policy Framework
MaRS formalizes six forgetting policies—FIFO, LRU, Priority Decay, Reflection-Summary, Random-Drop, and Hybrid—with complexity analysis and sensitivity-aware retention under explicit budgets.
03
Forgetful but Faithful Agent FiFA
MaRS is evaluated on the FiFA benchmark, where the Hybrid policy achieves a composite score of ≈0.911 across 300 runs and five memory budgets.

RESULTS

By the Numbers

Composite FiFA score

0.911

Hybrid vs simpler policies (exact baselines not numerically specified)

Simulation runs

300

across five memory budgets in FiFA

Memory budgets

spanning low to high token limits

Forgetting policies

FIFO, LRU, Priority Decay, Reflection-Summary, Random-Drop, Hybrid

FiFA is a multi-agent simulation benchmark measuring narrative coherence, goal completion, social recall, privacy leakage, and cost efficiency under explicit token budgets. The ≈0.911 composite FiFA score shows that Hybrid retention in MaRS can balance coherence, efficiency, and privacy under constrained memory.

BENCHMARK

By the Numbers

BENCHMARK

FiFA Composite Performance Across Policies

Composite FiFA score combining narrative coherence, goal completion, social recall, privacy, and cost efficiency.

KEY INSIGHT

The Counterintuitive Finding

Hybrid forgetting in MaRS achieves a composite FiFA score of ≈0.911 while still enforcing strict memory budgets and privacy-aware retention.

This is surprising because many assume aggressive forgetting inevitably harms coherence, yet MaRS shows principled forgetting-by-design can improve both quality and governance.

WHY IT MATTERS

What this unlocks for the field

MaRS enables agents to treat memory as a governed resource, balancing episodic detail, semantic consolidation, social recall, and task context under explicit budgets.

Builders can now design agents that remain coherent over days, respect privacy norms, and keep costs tractable without relying on ever-growing context windows.

~14 min read← Back to papers

Related papers

BenchmarkAgent Memory

Active Context Compression: Autonomous Memory Management in LLM Agents

Nikhil Verma

· 2026

Focus Agent adds start_focus, complete_focus, a persistent Knowledge block, and an optimized Persistent Bash plus String-Replace Editor scaffold to actively compress context during long software-engineering tasks. On five hard SWE-bench Lite instances against a Baseline ReAct agent, Focus Agent achieves 22.7% token reduction (14.9M → 11.5M) while matching 3/5 = 60% task success.

arXiv:2601.07190 Read explainer

RAGBenchmarkAgent MemoryMemory Architecture

ADAM: A Systematic Data Extraction Attack on Agent Memory via Adaptive Querying

Xingyu Lyu, Jianfeng He et al.

· 2026

ADAM combines Anchor extraction, Distribution estimation, Anchor selection, and Query generation to adaptively probe agent memory via an auxiliary generator and entropy based selection. On the EHRAgent benchmark with Llama2-7b-chat, ADAM reaches EQ=77 and ASR=1.00, compared to MEXTRA’s EQ=44 and ASR=0.89.

arXiv:2604.09747 Read explainer

BenchmarkMemory Architecture

AdaMem: Adaptive User-Centric Memory for Long-Horizon Dialogue Agents

Shannan Yan, Jingchen Ni et al.

· 2026

AdaMem organizes dialogue history into Working Memory, Episodic Memory, Persona Memory, and Graph Memory coordinated by a Memory Agent, Research Agent, and Working Agent. On LoCoMo with GPT-4.1-mini, AdaMem achieves 44.65 F1 overall, beating the best baseline LangMem at 41.76 F1 by +2.89.

arXiv:2603.16496 Read explainer

Questions about this paper?

Answers use this explainer on Memory Papers.

Checking…