Think-in-Memory: Recalling and Post-thinking Enable LLMs with Long-Term Memory

AuthorsLei Liu, Xiaoyan Yang, Yue Shen et al.

2023

TL;DR

Think-in-Memory (TiM) stores and retrieves LLM-generated inductive thoughts via LSH-based hashing, boosting contextual coherence on Chinese GVD from 0.428 to 0.665 over SiliconFriend (+0.237).

SharePost on XLinkedIn

Read our summary here, or open the publisher PDF on the next tab.

THE PROBLEM

Repeated recall causes biased thoughts and incoherent long-term dialogue

Memory-augmented LLMs rely on iterative recalling and reasoning, which easily produce biased thoughts, i.e., inconsistent reasoning results over the same history.

In long-term tasks like medical AI assistants, this repeated reasoning over raw text can miss earlier symptoms and reduce diagnosis accuracy in multi-turn conversations.

HOW IT WORKS

Think-in-Memory — recalling and post-thinking with inductive thoughts

Think-in-Memory (TiM) combines an Agent A, a hash-based Memory Cache M, a Hash-based Mapping F(·), and Insert Forget Merge organization operations over inductive thoughts.

You can view TiM like a brain that stores distilled thoughts instead of raw events, using LSH buckets as shelves and Merge as compressing similar memories.

By recalling stored inductive thoughts instead of re-parsing full histories, TiM enables consistent reasoning and efficient long-term retrieval beyond a plain context window.

DIAGRAM

Two-stage recalling and post-thinking flow in TiM

This diagram shows how TiM performs Stage-1 Recall and Generation and Stage-2 Post-think and Update for each new query.

DIAGRAM

Evaluation pipeline across GVD, KdConv, and RMD

This diagram shows how TiM is combined with ChatGLM and Baichuan2 and evaluated on GVD, KdConv, and RMD with three human-judged metrics.

PROCESS

How Think-in-Memory Handles a Conversation Turn

  1. 01

    Stage-1 Recall and Generation

    Given a new query Q, Think-in-Memory uses Agent A, Hash-based Mapping F(·), and Memory Cache M to recall top-k relevant inductive thoughts and generate a response.

  2. 02

    Stage-2 Post-think and Update

    After producing the response R, Think-in-Memory lets Agent A post-think on the Q-R pair and generate new inductive thoughts describing entities and relations.

  3. 03

    Insert operation

    Think-in-Memory applies the Insert operation by hashing each new inductive thought with F(·) and storing it in the corresponding group inside Memory Cache M.

  4. 04

    Forget and Merge operations

    Think-in-Memory periodically uses Forget to remove counterfactual thoughts and Merge to combine similar thoughts with the same entity within each hash group.

KEY CONTRIBUTIONS

Key Contributions

  • 01

    Think-in-Memory long-term mechanism

    Think-in-Memory introduces an LLM-agnostic Memory Cache M of inductive thoughts with Hash-based Mapping F(·), enabling recall without repeated reasoning over raw Q-R histories.

  • 02

    Insert Forget Merge organization

    Think-in-Memory formalizes Insert, Forget, and Merge operations over thoughts within hash groups, mirroring human cognitive processes for dynamic memory evolution.

  • 03

    Hash-based retrieval for long-term conversations

    Think-in-Memory integrates Locality-Sensitive Hashing into F(·), reducing retrieval time from 0.6287 ms to 0.5305 ms and improving contextual coherence on GVD and KdConv.

RESULTS

By the Numbers

Contextual Coherence (GVD Chinese)

0.665

+0.237 over SiliconFriend

Response Correctness (GVD Chinese)

0.605

+0.187 over SiliconFriend

Retrieval Accuracy (KdConv Film ChatGLM)

0.920

top-5 recall within TiM memory cache

Retrieval Time per query

0.5305 ms

0.0982 ms faster than baseline pairwise similarity

On the Chinese part of GVD, which tests long-term open-domain dialogue, Think-in-Memory with ChatGLM raises contextual coherence from 0.428 to 0.665 and response correctness from 0.418 to 0.605 over SiliconFriend. On KdConv Film with ChatGLM, Think-in-Memory achieves 0.920 retrieval accuracy and improves response correctness from 0.657 to 0.827.

BENCHMARK

By the Numbers

On the Chinese part of GVD, which tests long-term open-domain dialogue, Think-in-Memory with ChatGLM raises contextual coherence from 0.428 to 0.665 and response correctness from 0.418 to 0.605 over SiliconFriend. On KdConv Film with ChatGLM, Think-in-Memory achieves 0.920 retrieval accuracy and improves response correctness from 0.657 to 0.827.

BENCHMARK

Comparison on GVD Chinese with ChatGLM

Contextual Coherence on the Chinese part of the Generated Virtual Dataset (GVD).

KEY INSIGHT

The Counterintuitive Finding

Despite using only distilled inductive thoughts, Think-in-Memory reaches 0.920 retrieval accuracy on KdConv Film with ChatGLM using top-5 recall.

This is surprising because many expect discarding raw dialogue text to hurt retrieval, yet structured thoughts plus LSH actually improve both efficiency and accuracy.

WHY IT MATTERS

What this unlocks for the field

Think-in-Memory unlocks LLM agents that maintain consistent reasoning paths over long-term conversations by recalling inductive thoughts instead of re-reading entire histories.

Builders can now bolt TiM onto existing LLMs like ChatGLM or Baichuan2 to get efficient, evolvable long-term memory without modifying model architectures.

~11 min read← Back to papers

Related papers

Agent MemoryLong-Term Memory

Adaptive Memory Admission Control for LLM Agents

Guilin Zhang, Wei Jiang et al.

· 2026

A-MAC scores candidate memories using Utility, Confidence, Novelty, Recency, and Type Prior combined by a learned linear admission policy with Algorithm 1 A-MAC Memory Admission. On the LoCoMo benchmark, A-MAC achieves F1 0.583 and 2644 ms latency, improving F1 by 0.042 and reducing latency by 1187 ms compared to A-mem.

Long-Term Memory

Advancing Open-source World Models

Robbyant Team, Zelin Gao et al.

arXiv 2026 · 2026

LingBot-World combines a Data Engine, Fundamental World Model, Action-Conditioned World Model, and Post-Training causal adaptation to turn a 28B-parameter video generator into a real-time interactive world simulator. On the VBench benchmark, LingBot-World achieves a dynamic degree of 0.8857 versus 0.7612 for Yume-1.5, while also improving imaging quality to 0.6683.

BenchmarkBenchmarkLong-Term Memory

AgenticAI-DialogGen: Topic-Guided Conversation Generation for Fine-Tuning and Evaluating Short- and Long-Term Memories of LLMs

Manoj Madushanka Perera, Adnan Mahmood et al.

· 2026

AgenticAI-DialogGen chains ChatPreprocessor, KnowledgeExtractor, TopicAnalyzer, KnowledgeGraphBuilder, PersonaGenerator, DuelingChat Agent, ConversationValidator, ConversationRefiner, QAGeneration, and PostProcessing to turn raw multi-session chats into topic-guided, persona-grounded conversations with explicit short- and long-term memories. On the TGC / KG memory QA benchmark, Mistral-7B fine-tuned within AgenticAI-DialogGen achieves 87.36 F1, compared to GPT-4’s 83.77 F1 in a zero-shot setting on the same task.

Questions about this paper?

Paper: Think-in-Memory: Recalling and Post-thinking Enable LLMs with Long-Term Memory

Answers use this explainer on Memory Papers.

Checking…