AMV-L: Lifecycle-Managed Agent Memory for Tail-Latency Control in Long-Running LLM Systems

AuthorsEmmanuel Bamidele

2026

TL;DR

AMV-L uses value-driven lifecycle tiers with bounded retrieval eligibility to cut p99 latency from 5398.167 ms (TTL) to 1233.430 ms while boosting throughput 3.1×.

SharePost on XLinkedIn

Read our summary here, or open the publisher PDF on the next tab.

THE PROBLEM

Tail latency from unbounded retrieval working sets (13.8% of TTL requests exceed 2s)

TTL-based memory keeps many items eligible for retrieval, so candidate sets and vector scans grow, causing heavy-tailed latency and unstable throughput.

Under TTL, 13.813% of requests exceed 2 seconds and p99 latency reaches 5398.167 ms, making long-running LLM agents violate tail SLOs and waste capacity.

HOW IT WORKS

AMV-L: Adaptive Memory Value Lifecycle

AMV-L combines a Memory Value Model, Tiered Lifecycle, Bounded Retrieval Path, and Lifecycle Manager to control which items stay on the retrieval path.

You can think of AMV-L like an OS memory hierarchy: the Hot tier is RAM, Warm/Cold tiers are slower storage, and eligibility is a budgeted working set.

This value-driven lifecycle lets AMV-L bound retrieval cost independently of total retained memory, something a plain context window or TTL retention cannot provide.

DIAGRAM

Request-time Retrieval and Prompt Construction Flow

This diagram shows how AMV-L processes a single request, from query embedding through tier-aware retrieval, value updates, and prompt construction.

DIAGRAM

Evaluation Pipeline for TTL, LRU, and AMV-L

This diagram shows how the workload is replayed three times with different memory policies to compare TTL, LRU, and AMV-L under identical conditions.

PROCESS

How AMV-L Handles a Request — Adaptive Memory Value Lifecycle

  1. 01

    Memory items and value state

    AMV-L represents each memory item with content, metadata, an embedding, and a scalar value V(m) updated online by the Memory Value Model.

  2. 02

    Tiered lifecycle organization

    AMV-L assigns items to Hot, Warm, or Cold tiers using the Tiered Lifecycle, separating long-term retention from retrieval eligibility.

  3. 03

    Lifecycle transitions

    The Lifecycle Manager promotes, demotes, and evicts items based on value thresholds and hysteresis, keeping the Hot tier as the effective working set.

  4. 04

    Retrieval and prompt construction

    The Bounded Retrieval Path builds R = TH ∪ Samplek(TW), runs similarity search, applies the prompt-injection cap n, and feeds selected chunks into the LLM Answer Pipeline.

KEY CONTRIBUTIONS

Key Contributions

  • 01

    Persistent agent memory as a request path systems resource

    AMV-L formalizes memory as a working-set resource and shows that bounding retrieval eligibility cuts p99 latency from 5398.167 ms (TTL) to 1233.430 ms.

  • 02

    Adaptive Memory Value Lifecycle

    AMV-L introduces a value-driven Tiered Lifecycle with Hot, Warm, and Cold tiers plus a Memory Value Model that uses decay and reinforcement to manage eligibility.

  • 03

    Tail latency and throughput tradeoff frontier

    AMV-L demonstrates a tradeoff versus LRU, with +26% median latency but −15% p99 and −98% requests over 2s, while using ≈6% fewer tokens per request.

RESULTS

By the Numbers

Throughput (req/s)

36.977 req/s

+27.950 over TTL

Latency p99 (ms)

1233.430 ms

−4164.737 vs TTL

Latency > 2s (%)

0.007%

−13.806 percentage points vs TTL

Tokens/request mean

675.388 tokens

≈6% fewer than LRU at similar retrieval quality

On a synthetic long-running workload with 50,000 writes and 20,000 retrieval and Ask requests, AMV-L is compared against TTL and LRU. The results show that AMV-L restores predictable latency tails and high throughput while keeping token and retrieval quality metrics comparable to LRU.

BENCHMARK

By the Numbers

On a synthetic long-running workload with 50,000 writes and 20,000 retrieval and Ask requests, AMV-L is compared against TTL and LRU. The results show that AMV-L restores predictable latency tails and high throughput while keeping token and retrieval quality metrics comparable to LRU.

BENCHMARK

Reliability and end to end performance (TTL vs LRU vs AMV-L)

Throughput (req/s) comparison for TTL, LRU, and AMV-L under the shared long-running workload.

KEY INSIGHT

The Counterintuitive Finding

AMV-L scans more vectors at p95 than LRU (690 vs 261) yet still achieves lower p99 latency (1233.430 ms vs 1452.706 ms).

This is surprising because we usually assume fewer scanned vectors always mean better tails, but AMV-L’s value-driven eligibility shows that which items are eligible matters more than raw scan count.

WHY IT MATTERS

What this unlocks for the field

AMV-L gives LLM agents a controllable memory working set, so tail latency depends on Hot tier size, not total retained items.

Builders can now run long-lived agents with large persistent memories while still meeting strict p95 and p99 SLOs, without aggressive TTL deletion or manual memory pruning.

~14 min read← Back to papers

Related papers

Agent MemoryLong-Term Memory

Agentic Memory: Learning Unified Long-Term and Short-Term Memory Management for Large Language Model Agents

Yi Yu, Liuyi Yao et al.

arXiv 2026 · 2026

Agentic Memory (AgeMem) exposes memory management tools, a three-stage progressive RL strategy, and step-wise GRPO directly inside the agent policy to jointly control long-term and short-term memory. On Qwen3-4B-Instruct, AgeMem attains 54.31% average performance across ALFWorld, SciWorld, PDDL, BabyAI, and HotpotQA, exceeding the best baseline A-Mem at 45.74%.

Agent Memory

AMemGym: Interactive Memory Benchmarking for Assistants in Long-Horizon Conversations

Cheng Jiayang, Dongyu Ru et al.

· 2026

AMemGym combines Structured Data Generation, On-Policy Interaction, Evaluation Metrics, and Meta-Evaluation to script user state trajectories, drive LLM-simulated role-play, and score write–read–utilization behavior. On AMemGym’s base configuration, AWE-(2,4,30) reaches a 0.291 normalized memory score on interactive evaluation, while native gpt-4.1-mini only achieves 0.203, exposing substantial gaps between memory agents and plain long-context LLMs.

SurveyAgent Memory

Anatomy of Agentic Memory: Taxonomy and Empirical Analysis of Evaluation and System Limitations

Dongming Jiang, Yi Li et al.

arXiv 2026 · 2026

Anatomy of Agentic Memory organizes agentic memory into four structures using components like Lightweight Semantic Memory, Entity-Centric and Personalized Memory, Episodic and Reflective Memory, and Structured and Hierarchical Memory. Anatomy of Agentic Memory then reports comparative results such as Nemori’s 0.781 semantic judge score on LoCoMo versus SimpleMem’s 0.298, and latency differences like 1.129s for Nemori versus 32.372s for MemoryOS.

Questions about this paper?

Paper: AMV-L: Lifecycle-Managed Agent Memory for Tail-Latency Control in Long-Running LLM Systems

Answers use this explainer on Memory Papers.

Checking…