Memoria: A Scalable Agentic Memory Framework for Personalized Conversational AI

AuthorsSamarth Sarin, Lovepreet Singh, Bhaskarjit Sarmah, Dhagash Mehta

2025

TL;DR

Memoria uses weighted knowledge graph triplets plus session-level summarization to deliver 87.1% accuracy on LongMemEvals single-session-user questions, beating A-Mem (OpenAI) by 2.9 points.

SharePost on XLinkedIn

Read our summary here, or open the publisher PDF on the next tab.

THE PROBLEM

Stateless LLM chats lose continuity and personalization across sessions

Memoria targets LLM systems where each interaction is treated in isolation, discarding previous context and failing to adapt over time.

This breaks personalized conversational assistants and agentic memory use cases, causing repetitive questions, weak long term context, and lower user trust.

HOW IT WORKS

Memoria — structured logging plus weighted knowledge graphs

Memoria’s core mechanism combines structured conversation logging, dynamic user persona via KG, session level memory for real time context, and seamless retrieval for context aware responses.

You can think of Memoria like RAM plus disk: session summaries act as fast working context, while the weighted KG behaves like a long term card catalog of user facts.

This design lets Memoria inject only the most relevant, recency weighted triplets and summaries into the prompt, enabling continuity that a plain context window cannot provide.

DIAGRAM

Memoria interaction flow across user and session types

This diagram shows how Memoria routes new and repeat users through session summaries and KG retrieval to build prompts.

DIAGRAM

Memoria evaluation and comparison pipeline

This diagram shows how Memoria and A-Mem are evaluated on LongMemEvals with shared LLM and different embedding setups.

PROCESS

How Memoria Handles a User Interaction Lifecycle

  1. 01

    Structured Conversation History with Database

    Memoria logs each user query and LLM reply with timestamps, session identifiers, KG triplets, summaries, and token usage into the structured conversation logging module.

  2. 02

    Dynamic User Persona via KG

    Memoria extracts subject predicate object triplets from user messages, embeds them, and grows a weighted knowledge graph capturing recurring topics, preferences, and entities.

  3. 03

    Session Level Memory for Real Time Context

    Memoria incrementally builds session level memory for real time context by summarizing dialogue turns and storing summaries keyed by session identifiers.

  4. 04

    Seamless Retrieval for Context Aware Responses

    Memoria performs seamless retrieval for context aware responses by fetching session summaries and top K KG triplets, applying exponential decay weights, and constructing a distilled prompt.

KEY CONTRIBUTIONS

Key Contributions

  • 01

    Structured Conversation History with Database

    Memoria introduces a structured conversation logging layer that stores raw messages, KG triplets, summaries, and token statistics in SQLite3 plus ChromaDB, turning chats into a queryable memory bank.

  • 02

    Dynamic User Persona via KG

    Memoria builds a dynamic user persona via KG by embedding and linking user specific triplets, then weighting them with an exponential decay function to resolve conflicting preferences over time.

  • 03

    Session Level Memory and Seamless Retrieval

    Memoria combines session level memory for real time context with seamless retrieval for context aware responses, achieving 87.1% single session accuracy and up to 38.7% latency reduction versus full context prompting.

RESULTS

By the Numbers

Single Session accuracy

87.1%

+2.9 over A-Mem (OA)

Knowledge Update accuracy

80.8%

+1.4 over A-Mem (OA)

Inference Time single session-user

260 secs

131 secs faster than Full Context

Avg Token length single session-user

398 tokens

vs 115000 tokens for Full Context

On LongMemEvals single-session-user and knowledge-update subsets, which stress user specific recall and knowledge updates, Memoria shows that recency weighted KG memory can match or exceed full context accuracy with far shorter prompts.

BENCHMARK

By the Numbers

On LongMemEvals single-session-user and knowledge-update subsets, which stress user specific recall and knowledge updates, Memoria shows that recency weighted KG memory can match or exceed full context accuracy with far shorter prompts.

BENCHMARK

Accuracy Comparison Across Memory Strategies

Accuracy on LongMemEvals single-session-user questions.

KEY INSIGHT

The Counterintuitive Finding

Memoria with weighted KG and summaries reaches 87.1% single session accuracy, slightly above the 85.7% full conversation baseline while using tiny prompts.

This is surprising because many assume feeding the entire 115000 token history is always best, yet Memoria’s curated memory beats that with only about 398 tokens.

WHY IT MATTERS

What this unlocks for the field

Memoria unlocks practical agentic memory by combining structured logging, KG based personas, and recency aware retrieval into a plug and play Python framework.

Builders can now ship conversational agents that remember users across long horizons without exploding token budgets, enabling persistent assistants, advisors, and support bots on modest infrastructure.

~12 min read← Back to papers

Related papers

BenchmarkAgent Memory

Active Context Compression: Autonomous Memory Management in LLM Agents

Nikhil Verma

· 2026

Focus Agent adds start_focus, complete_focus, a persistent Knowledge block, and an optimized Persistent Bash plus String-Replace Editor scaffold to actively compress context during long software-engineering tasks. On five hard SWE-bench Lite instances against a Baseline ReAct agent, Focus Agent achieves 22.7% token reduction (14.9M → 11.5M) while matching 3/5 = 60% task success.

Questions about this paper?

Paper: Memoria: A Scalable Agentic Memory Framework for Personalized Conversational AI

Answers use this explainer on Memory Papers.

Checking…