Security & Privacy

Security and privacy of AI memory systems — memory attacks, data extraction, privacy-aware agents, and memory governance.

7 papers

RAGBenchmarkAgent MemoryMemory Architecture

ADAM: A Systematic Data Extraction Attack on Agent Memory via Adaptive Querying

Xingyu Lyu, Jianfeng He et al.

· 2026

ADAM combines Anchor extraction, Distribution estimation, Anchor selection, and Query generation to adaptively probe agent memory via an auxiliary generator and entropy based selection. On the EHRAgent benchmark with Llama2-7b-chat, ADAM reaches EQ=77 and ASR=1.00, compared to MEXTRA’s EQ=44 and ASR=0.89.

arXiv:2604.09747 Read explainer

BenchmarkBenchmarkAgent Memory

AgentSys: Secure and Dynamic LLM Agents Through Explicit Hierarchical Memory Management

Ruoyao Wen, Hao Li et al.

· 2026

AGENTSYS organizes a Main Agent, Worker Agents, Intent Schemas, and an Alignment Validator into a hierarchical memory system that isolates raw tool outputs and only admits schema-validated JSON. On AgentDojo, AGENTSYS reaches 52.87% attacked utility and 0.78% ASR versus 48.27% and 30.66% for the No Defense baseline.

arXiv:2602.07398 Read explainer

SurveyBenchmarkAgent MemoryLong-Term MemoryMemory Architecture

A Survey on the Security of Long-Term Memory in LLM Agents: Toward Mnemonic Sovereignty

Zehao Lin, Chunyu Li, Kai Chen

· 2026

Mnemonic Sovereignty analyzes long term Write, Store, Retrieve, Execute, Share, and Forget Rollback phases against integrity, confidentiality, availability, and governance objectives for agent memory. Mnemonic Sovereignty’s lifecycle matrix shows most of the ~70 works cluster on write and retrieve integrity, leaving store, availability, and governance primitives like write gate validation and post deletion verification almost entirely unexplored.

arXiv:2604.16548 Read explainer

BenchmarkBenchmarkAgent MemoryLong-Term Memory

Governing Evolving Memory in LLM Agents: Risks, Mechanisms, and the Stability and Safety Governed Memory (SSGM) Framework

Chingkwun Lam, Jiaxin Li et al.

· 2026

SSGM interposes a Governance Middleware, Read Filtering Gate, Write Validation Gate, and a dual substrate of Mutable Active Graph plus Immutable Episodic Log between agents and memory. SSGM unifies evolving-memory systems into a four-dimensional failure taxonomy and proves that periodic reconciliation can bound semantic drift over infinite horizons.

arXiv:2603.11768 Read explainer

BenchmarkBenchmarkBenchmarkAgent MemoryLong-Term Memory

MemEvoBench: Benchmarking Memory MisEvolution in LLM Agents

Weiwei Xie, Shaoxiong Guo et al.

· 2026

MemEvoBench combines Misleading Memory Injection, Noisy Tool Returns, Biased User Feedback, and a Memory Modification Tool (+ModTool) to stress-test long-term memory safety in LLM agents across 7 domains and 36 risk types. On the QA Style benchmark, MemEvoBench shows Gemini-2.5-Pro’s ASR drops from 67.0% (Vanilla) to 19.0% with +ModTool in Round 1, while biased feedback can push GPT-5’s QA ASR from 59.0% to 78.0% by Round 3.

arXiv:2604.15774 Read explainer

BenchmarkBenchmarkAgent MemoryMemory Architecture

Forgetful but Faithful: A Cognitive Memory Architecture and Benchmark for Privacy-Aware Generative Agents

Saad Alqithami

· 2025

MaRS organizes agent memory into episodic, semantic, social, and task nodes with provenance, scored by a privacy-aware retention controller and governed by FIFO, LRU, Priority Decay, Reflection-Summary, Random-Drop, and Hybrid policies. On the FiFA benchmark, the Hybrid policy in MaRS achieves a composite score of ≈0.911 across 300 runs and five memory budgets, outperforming simpler policies while preserving privacy and cost efficiency.

arXiv:2512.12856 Read explainer

RAGBenchmarkBenchmarkMemory Architecture

Memory-Augmented Log Analysis with Phi-4-mini: Enhancing Threat Detection in Structured Security Logs

Anbi Guo, Mahfuza Farooque

· 2025

DM-RAG augments Phi-4-mini with a Short-Term Memory (STM) buffer, Long-Term Memory (LTM) FAISS store, Bayesian fusion, and a logistic regression confidence model for structured log analysis. On UNSW-NB15, DM-RAG reaches 98.70% recall and 69.59% F1, beating the Phi-4 + RAG (MITRE) baseline in F1 by 17.89 points.

arXiv:2510.00529 Read explainer