A-MemGuard: A Proactive Defense Framework for LLM-Based Agent Memory

AuthorsQianshan Wei, Tengchao Yang, Yaochen Wang et al.

2025

TL;DR

A-MemGuard uses consensus-based validation and a dual-memory lesson store to cut memory attack success rates by over 95% with minimal utility loss.

SharePost on X LinkedIn

Read our summary here, or open the publisher PDF on the next tab.

THE PROBLEM

Context-dependent memory poisoning and 66% missed poisoned entries

Agent Security Bench shows that even advanced LLM-based detectors miss 66% of poisoned memory entries because they look harmless in isolation.

In knowledge-intensive QA and healthcare agents, these context-dependent records corrupt reasoning and create self-reinforcing error cycles, where each wrong action becomes a trusted precedent.

HOW IT WORKS

A-MemGuard — consensus validation plus dual-memory lessons

A-MemGuard builds on consensus-based validation, dual-memory structure, lesson memory, and path divergence scoring to detect and neutralize poisoned memories before actions are executed.

You can think of consensus-based validation as multiple witnesses cross-checking a story, while the lesson memory acts like a blacklist of past reasoning mistakes.

This design lets A-MemGuard catch context-triggered anomalies and reuse structured “negative lessons,” something a plain context window or isolated content filter cannot achieve.

DIAGRAM

Query-time consensus validation and lesson-guided revision

This diagram shows how A-MemGuard processes a single query through parallel reasoning paths, consensus validation, lesson distillation, and action revision.

DIAGRAM

Evaluation pipeline across tasks and attacks

This diagram shows how A-MemGuard is evaluated on direct and indirect memory attacks plus multi-agent misinformation.

PROCESS

How A-MemGuard Handles a Memory-Augmented Agent Query

01
Parallel Reasoning Path Generation
A-MemGuard uses consensus-based validation to call Λ and build structured reasoning paths ˆρi from each retrieved memory in Mr for the current query.
02
Path Divergence Scoring and Validation
A-MemGuard applies path divergence scoring Sdiv over the set of paths ˆPt, filtering memories into the validated subset Mval based on a threshold τ.
03
Structured Lesson Distillation
For any anomalous path ˆρj, A-MemGuard defines a lesson lt and appends it into the dedicated lesson memory Mles as a reusable negative example.
04
Proactive Deliberation and Action Revision
A-MemGuard structures the candidate plan ˆpfinal, retrieves similar lessons Lrel from Mles, and revises the final action using the defended policy π′.

KEY CONTRIBUTIONS

Key Contributions

01
Proactive defense for agent memory
A-MemGuard is the first framework explicitly securing agent memory against context-dependent attacks and self-reinforcing error cycles using consensus-based validation and a dual-memory structure.
02
Consensus-based validation and dual-memory structure
A-MemGuard introduces consensus-based validation over structured reasoning paths and a lesson memory that stores anomalous paths as negative lessons for future correction.
03
Extensive experiments on diverse attacks
A-MemGuard cuts ASR-r from 100.0 to 2.13 on EHRAgent and reduces indirect attack ASR on MMLU from 0.667 to 0.256 while maintaining top benign accuracy.

RESULTS

By the Numbers

ASR-r

2.13%

-97.87 pp vs No Defense on EHRAgent GPT-4o-mini + DPR

ASR-t

6.38%

-93.62 pp vs No Defense on EHRAgent GPT-4o-mini + REALM

Benign ACC

77.3%

+6.2 pp over No Defense on ReAct-StrategyQA GPT-4o-mini + REALM

Indirect ASR

0.256

-0.411 vs No Defense on MMLU GPT-4o-mini

These metrics come from AgentPoison on ReAct-StrategyQA and EHRAgent plus indirect injection on MMLU, showing that A-MemGuard sharply lowers attack success while preserving or improving benign accuracy.

BENCHMARK

By the Numbers

BENCHMARK

Defensive performance against AgentPoison on EHRAgent (GPT-4o-mini + DPR, ASR-r)

Attack Success Rate at retrieval (ASR-r) on EHRAgent under AgentPoison.

BENCHMARK

Indirect memory injection on MMLU (average ASR)

Average Attack Success Rate under indirect memory injection on MMLU.

KEY INSIGHT

The Counterintuitive Finding

A-MemGuard reduces EHRAgent ASR-r from 100.0 to 2.13 while still achieving benign accuracy up to 77.3% on ReAct-StrategyQA.

This is surprising because stronger defenses often over-filter useful memories, yet A-MemGuard both hardens security and improves task accuracy in several settings.

WHY IT MATTERS

What this unlocks for the field

A-MemGuard enables LLM agents to treat memory as a self-checking, self-correcting component that learns from its own failures over time.

Builders can now deploy memory-augmented agents in high-stakes domains like healthcare and finance without accepting runaway error cycles from subtle memory poisoning.

~13 min read← Back to papers

Related papers

BenchmarkAgent Memory

Active Context Compression: Autonomous Memory Management in LLM Agents

Nikhil Verma

· 2026

Focus Agent adds start_focus, complete_focus, a persistent Knowledge block, and an optimized Persistent Bash plus String-Replace Editor scaffold to actively compress context during long software-engineering tasks. On five hard SWE-bench Lite instances against a Baseline ReAct agent, Focus Agent achieves 22.7% token reduction (14.9M → 11.5M) while matching 3/5 = 60% task success.

arXiv:2601.07190 Read explainer

Agent Memory

ActMem: Bridging the Gap Between Memory Retrieval and Reasoning in LLM Agents

Xiaohui Zhang, Zequn Sun et al.

· 2026

ActMem transforms dialogue history into atomic facts via Memory Fact Extraction, groups them with Fact Clustering, links them through a Memory KG Construction module, and uses Counterfactual-based Retrieval and Reasoning for action-aware answers. On ActMemEval, ActMem reaches 76.52% QA accuracy with DeepSeek-V3, beating LightMem’s 63.97% by 12.55 points and NaiveRAG’s 61.54%.

arXiv:2603.00026 Read explainer

RAGBenchmarkAgent MemoryMemory Architecture

ADAM: A Systematic Data Extraction Attack on Agent Memory via Adaptive Querying

Xingyu Lyu, Jianfeng He et al.

· 2026

ADAM combines Anchor extraction, Distribution estimation, Anchor selection, and Query generation to adaptively probe agent memory via an auxiliary generator and entropy based selection. On the EHRAgent benchmark with Llama2-7b-chat, ADAM reaches EQ=77 and ASR=1.00, compared to MEXTRA’s EQ=44 and ASR=0.89.

arXiv:2604.09747 Read explainer

Questions about this paper?

Answers use this explainer on Memory Papers.

Checking…