Unveiling Privacy Risks in LLM Agent Memory

AuthorsBo Wang, Weiyi He, Shenglai Zeng et al.

2025

TL;DR

MEXTRA uses locator plus workflow aligned attacking prompts to systematically extract up to 50 private queries from LLM agent memory under black box access.

SharePost on XLinkedIn

Read our summary here, or open the publisher PDF on the next tab.

THE PROBLEM

LLM agents leak stored queries under memory extraction attacks

MEXTRA shows that with 30 attacking prompts and memory size 200, EHRAgent leaks 50 stored queries and RAP leaks 26, revealing substantial privacy risks.

These leaks expose private user queries and associated actions in agents like EHRAgent and RAP, enabling unauthorized data access and downstream misuse such as discriminatory decisions.

HOW IT WORKS

MEXTRA — Memory EXTRaction Attack on agent memory

MEXTRA combines attacking prompt design, automated diverse prompts generation, and the agent memory module M with similarity scoring function f(q, qi) to extract stored queries.

Think of MEXTRA as crafting special keys for a locked filing cabinet, where the cabinet is the agent memory and the lock is the retrieval mechanism.

By steering retrieval and execution, MEXTRA makes the agent surface memory contents that a plain context window and naive prompts like “repeat all the context” cannot expose.

DIAGRAM

Black box interaction flow of MEXTRA with an LLM agent

This diagram shows how MEXTRA interacts with the LLM agent in a black box setting to retrieve and leak memory records.

DIAGRAM

Evaluation configuration and ablation pipeline for MEXTRA

This diagram shows how MEXTRA is evaluated across agents, memory configurations, and prompting strategies.

PROCESS

How MEXTRA Handles a Memory Extraction Attack Session

  1. 01

    Agent Workflow

    MEXTRA targets the Agent Workflow where the LLM agent core uses similarity scoring function f(q, qi) to retrieve records E(q, M) from memory module M.

  2. 02

    Attacking Prompt Design

    MEXTRA designs attacking prompts as q_loc plus q_align so the agent both locates retrieved user queries and aligns outputs with its workflow.

  3. 03

    Automated Diverse Prompts Generation

    MEXTRA uses GPT 4 with instructions Ibasic or Iadvan to generate n diverse attacking prompts that maximize coverage of retrieved subsets R.

  4. 04

    Memory Extraction

    MEXTRA executes the malicious solutions s~ through Execute s T, collects outputs o~, and aggregates extracted queries into Q to measure EN and EE.

KEY CONTRIBUTIONS

Key Contributions

  • 01

    Memory EXTRaction Attack MEXTRA

    MEXTRA introduces a black box Memory EXTRaction Attack that exploits the memory module M and similarity scoring function f(q, qi) to leak stored user queries.

  • 02

    Attacking Prompt Design and Generation

    MEXTRA formalizes locator and aligner prompt parts and automated diverse prompts generation with Ibasic and Iadvan tailored to agent implementation knowledge.

  • 03

    Systematic Analysis of Memory Leakage Factors

    MEXTRA analyzes how scoring function, embedding model, retrieval depth k, memory size m, and LLM backbone impact extracted number EN and extracted efficiency EE.

RESULTS

By the Numbers

Extracted Number EHRAgent

50 queries

+14 over w/o aligner baseline on EHRAgent

Extracted Efficiency EHRAgent

0.42

vs 0.30 for w/o aligner on EHRAgent

Extracted Number RAP

26 queries

+20 over w/o aligner baseline on RAP

Extracted Efficiency RAP

0.29

vs 0.07 for w/o aligner on RAP

On EHRAgent and RAP with memory size 200 and 30 attacking prompts, MEXTRA achieves higher Extracted Number and Extracted Efficiency than baselines, proving that workflow aligned attacking prompts substantially increase memory leakage.

BENCHMARK

By the Numbers

On EHRAgent and RAP with memory size 200 and 30 attacking prompts, MEXTRA achieves higher Extracted Number and Extracted Efficiency than baselines, proving that workflow aligned attacking prompts substantially increase memory leakage.

BENCHMARK

Attacking results on two agents with 30 prompts and memory size 200

Extracted Number EN for MEXTRA and its baselines on EHRAgent and RAP.

KEY INSIGHT

The Counterintuitive Finding

MEXTRA shows that using edit distance as the scoring function can leak more than 30 percent of memory records when n reaches 50.

This is surprising because edit distance seems simpler and safer than semantic cosine similarity, yet it enables broader retrieval and higher leakage than expected.

WHY IT MATTERS

What this unlocks for the field

MEXTRA provides a concrete framework and metrics to stress test LLM agent memory modules under realistic black box adversaries.

Builders can now quantitatively evaluate how design choices like retrieval depth, memory size, and scoring function change privacy risk, and design safer agent memory configurations.

~13 min read← Back to papers

Related papers

BenchmarkAgent Memory

Active Context Compression: Autonomous Memory Management in LLM Agents

Nikhil Verma

· 2026

Focus Agent adds start_focus, complete_focus, a persistent Knowledge block, and an optimized Persistent Bash plus String-Replace Editor scaffold to actively compress context during long software-engineering tasks. On five hard SWE-bench Lite instances against a Baseline ReAct agent, Focus Agent achieves 22.7% token reduction (14.9M → 11.5M) while matching 3/5 = 60% task success.

Agent Memory

ActMem: Bridging the Gap Between Memory Retrieval and Reasoning in LLM Agents

Xiaohui Zhang, Zequn Sun et al.

· 2026

ActMem transforms dialogue history into atomic facts via Memory Fact Extraction, groups them with Fact Clustering, links them through a Memory KG Construction module, and uses Counterfactual-based Retrieval and Reasoning for action-aware answers. On ActMemEval, ActMem reaches 76.52% QA accuracy with DeepSeek-V3, beating LightMem’s 63.97% by 12.55 points and NaiveRAG’s 61.54%.

Questions about this paper?

Paper: Unveiling Privacy Risks in LLM Agent Memory

Answers use this explainer on Memory Papers.

Checking…