GradMem: Learning to Write Context into Memory with Test-Time Gradient Descent
Yuri Kuratov, Matvey Kairov et al.
· 2026
GradMem combines a WRITE phase, a READ phase, a context encoder Eθ, a self-supervised WRITE objective Lwrite, and a meta-learned initialization M0 to optimize prefix memory tokens via test-time gradient descent while keeping model weights frozen. On associative KV-retrieval with 96 key–value pairs, GradMem with 5 gradient WRITE steps reaches 88.4% exact match versus 12.9% for forward-only RMT with the same 8-vector memory.