Category

Security & Privacy

Security and privacy of AI memory systems — memory attacks, data extraction, privacy-aware agents, and memory governance.

7 papers

SurveyBenchmarkAgent MemoryLong-Term MemoryMemory Architecture

A Survey on the Security of Long-Term Memory in LLM Agents: Toward Mnemonic Sovereignty

Zehao Lin, Chunyu Li, Kai Chen

· 2026

Mnemonic Sovereignty analyzes long term Write, Store, Retrieve, Execute, Share, and Forget Rollback phases against integrity, confidentiality, availability, and governance objectives for agent memory. Mnemonic Sovereignty’s lifecycle matrix shows most of the ~70 works cluster on write and retrieve integrity, leaving store, availability, and governance primitives like write gate validation and post deletion verification almost entirely unexplored.

BenchmarkBenchmarkAgent MemoryLong-Term Memory

Governing Evolving Memory in LLM Agents: Risks, Mechanisms, and the Stability and Safety Governed Memory (SSGM) Framework

Chingkwun Lam, Jiaxin Li et al.

· 2026

SSGM interposes a Governance Middleware, Read Filtering Gate, Write Validation Gate, and a dual substrate of Mutable Active Graph plus Immutable Episodic Log between agents and memory. SSGM unifies evolving-memory systems into a four-dimensional failure taxonomy and proves that periodic reconciliation can bound semantic drift over infinite horizons.

BenchmarkBenchmarkBenchmarkAgent MemoryLong-Term Memory

MemEvoBench: Benchmarking Memory MisEvolution in LLM Agents

Weiwei Xie, Shaoxiong Guo et al.

· 2026

MemEvoBench combines Misleading Memory Injection, Noisy Tool Returns, Biased User Feedback, and a Memory Modification Tool (+ModTool) to stress-test long-term memory safety in LLM agents across 7 domains and 36 risk types. On the QA Style benchmark, MemEvoBench shows Gemini-2.5-Pro’s ASR drops from 67.0% (Vanilla) to 19.0% with +ModTool in Round 1, while biased feedback can push GPT-5’s QA ASR from 59.0% to 78.0% by Round 3.

BenchmarkBenchmarkAgent MemoryMemory Architecture

Forgetful but Faithful: A Cognitive Memory Architecture and Benchmark for Privacy-Aware Generative Agents

Saad Alqithami

· 2025

MaRS organizes agent memory into episodic, semantic, social, and task nodes with provenance, scored by a privacy-aware retention controller and governed by FIFO, LRU, Priority Decay, Reflection-Summary, Random-Drop, and Hybrid policies. On the FiFA benchmark, the Hybrid policy in MaRS achieves a composite score of ≈0.911 across 300 runs and five memory budgets, outperforming simpler policies while preserving privacy and cost efficiency.

RAGBenchmarkBenchmarkMemory Architecture

Memory-Augmented Log Analysis with Phi-4-mini: Enhancing Threat Detection in Structured Security Logs

Anbi Guo, Mahfuza Farooque

· 2025

DM-RAG augments Phi-4-mini with a Short-Term Memory (STM) buffer, Long-Term Memory (LTM) FAISS store, Bayesian fusion, and a logistic regression confidence model for structured log analysis. On UNSW-NB15, DM-RAG reaches 98.70% recall and 69.59% F1, beating the Phi-4 + RAG (MITRE) baseline in F1 by 17.89 points.