Category

Procedural Memory

Procedural memory in LLM agents — learning skills, rules, and how-to knowledge from experience.

3 papers

BenchmarkBenchmark

APEX-EM: Non-Parametric Online Learning for Autonomous Agents via Structured Procedural-Episodic Experience Replay

Pratyay Banerjee, Masud Moshtaghi, Ankit Chadha

· 2026

APEX-EM combines a Procedural Knowledge Graph, Experience Memory store, PRGII workflow, Task Verifiers, and StructuralSignatureExtractor to store and reuse full procedural-episodic traces without changing model weights. On KGQAGen-10k, APEX-EM reaches 89.6% accuracy (95.3% CSR) versus 41.3% without memory and surpasses the GPT-4o w/ SP oracle at 84.9%.

RAGBenchmarkBenchmarkBenchmarkAgent MemoryLong-Term MemoryMemory Architecture

Evaluating Long-Term Memory for Long-Context Question Answering

Alessandra Terranova, Björn Ross, Alexandra Birch

· 2025

Evaluating Long-Term Memory for Long-Context Question Answering compares Full Context, RAG, A-Mem, RAG+PromptOpt, and RAG+EpMem memory components across semantic, episodic, and procedural memory for long conversational QA. On LoCoMo, RAG+EpMem reaches an average F1 ranking of 1.83 for Llama 3.2-3B Instruct and 1.80 for GPT-4o mini while using around 1,000 tokens per query versus over 23,000 for Full Context.

Benchmark

Memp : Exploring Agent Procedural Memory

Runnan Fang, Yuan Liang et al.

· 2025

Memp constructs agent skills via Build, Retrieve, and Update modules that turn past trajectories into scripts, trajectories, and combined proceduralizations stored in a procedural memory library. On ALFWorld, Memp’s proceduralization with GPT-4o reaches 77.86% test success versus 42.14% with no memory, while reducing steps from 23.76 to 15.01.