Agent Memory Below the Prompt: Persistent Q4 KV Cache for Multi-Agent LLM Inference on Edge Devices
Yakov Pyotr Shkolnikov
· 2026
Agent Memory Below the Prompt stores each agent’s KV state in a block pool, quantizes it via a Q4 pipeline, reloads it with BatchQuantizedKVCache, and reuses it across phases using cross-phase context injection. On Gemma 3 12B, Agent Memory Below the Prompt reduces cold TTFT from 172,096 ms to 1,264 ms at 32K context (136×) compared to FP16 prefix caching baselines like vllm-mlx.