LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household Robotics

AuthorsMarc Glocker, Peter Hönig, Matthias Hirschmanner, Markus Vincze

2025

TL;DR

LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household Robotics uses a three-agent LLM orchestration plus RAG-based memory to reach 84.3% lenient task planning accuracy with Qwen2.5-32B.

SharePost on X LinkedIn

Read our summary here, or open the publisher PDF on the next tab.

THE PROBLEM

Household robots lack long term memory for dynamic tasks

LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household Robotics targets long-horizon household tasks where existing planners do not track object positions over time in dynamic environments.

Without robust memory and knowledge base support, embodied systems cannot reliably answer follow-up questions about object locations or task completion, limiting trustworthy household assistance.

HOW IT WORKS

Agent orchestration with RAG based memory

LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household Robotics combines a routing agent, task planning agent, knowledge base agent, Grounded SAM, and ChromaDB-based RAG to plan and recall household actions.

Conceptually, LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household Robotics treats RAG like a searchable notebook, with agents as librarians deciding which notes to write, retrieve, and execute on the robot.

This architecture lets LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household Robotics answer temporal questions about past actions and object movements that a plain context window would miss or forget.

DIAGRAM

RAG workflow for long term household memory

This diagram shows how LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household Robotics ingests dialogue history into ChromaDB and retrieves relevant past actions to answer follow-up questions.

DIAGRAM

Evaluation pipeline for household scenarios and agents

This diagram shows how LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household Robotics is evaluated across three scenarios, knowledge base questions, and routing tasks.

PROCESS

How LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household Robotics Handles a Household Command Session

01
Routing agent
The routing agent in LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household Robotics classifies each user request as an action command, history query, or unclear request and forwards it appropriately.
02
Task planning agent
The task planning agent receives object lists from Grounded SAM and LLaMa3.2-Vision, then generates JSON task plans including objects and destinations using chain of thought prompting.
03
Knowledge base agent
The knowledge base agent in LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household Robotics uses RAG over ChromaDB and BGE-M3 embeddings to answer questions about past actions and object locations.
04
Document ingestion
During document ingestion, dialogue history question answer pairs are chunked, embedded, timestamped, and stored as vectors so LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household Robotics can support long term reasoning.

KEY CONTRIBUTIONS

Key Contributions

01
Long horizon task planner for household tasks
LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household Robotics introduces a long-horizon task planning agent using in-context learning and offline LLMs, reaching up to 84.3% total lenient accuracy with Qwen2.5-32B.
02
Use of RAG for efficient memory retrieval and object tracking
LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household Robotics employs a knowledge base agent with RAG over ChromaDB and BGE-M3 embeddings to answer temporal questions about object movements.
03
Modular agent orchestration system
LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household Robotics uses OpenAI Swarm to coordinate a routing agent, task planning agent, and knowledge base agent, improving robustness and modularity in embodied control.

RESULTS

By the Numbers

Total Accuracy Strict (%)

77.2%

+20.8 over LLaMa3.1-8B

Total Accuracy Lenient (%)

84.3%

+15.6 over Gemma2-27B

Knowledge Base Total Validity (%)

91.3%

+37.55 over Qwen2.5-32B without RAG

Routing Total Success Rate (%)

92.5%

+2.5 over Qwen2.5-32B routing

The task planning accuracies come from three household scenarios with strict and lenient metrics, while knowledge base validity and routing success are measured over repeated question sets. These results show LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household Robotics can combine high reasoning quality with reliable routing and memory-augmented retrieval in realistic household tasks.

BENCHMARK

By the Numbers

BENCHMARK

Task Planning Accuracy Across Different LLMs

Total lenient task planning accuracy (%) over three household scenarios.

BENCHMARK

Knowledge Base Response Accuracy With and Without RAG

Total validity (%) of knowledge base answers across four follow-up question types.

KEY INSIGHT

The Counterintuitive Finding

Despite weaker strict planning accuracy in one scenario, LLaMa3.1-8B achieves the highest routing success rate at 92.5% in LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household Robotics.

This is surprising because developers might expect the strongest reasoning model, Qwen2.5-32B with 84.3% lenient planning accuracy, to also dominate routing instead of trailing by 2.5 percentage points.

WHY IT MATTERS

What this unlocks for the field

LLM-Empowered Embodied Agent for Memory-Augmented Task Planning in Household Robotics enables embodied agents to both execute long-horizon household tasks and answer nuanced follow-up questions about past actions.

Builders can now prototype privacy-preserving, fully local household robots that combine multi-agent LLM orchestration, RAG-based memory, and vision grounding without any explicit model training.

~10 min read← Back to papers

Related papers

RAGBenchmarkAgent MemoryMemory Architecture

ADAM: A Systematic Data Extraction Attack on Agent Memory via Adaptive Querying

Xingyu Lyu, Jianfeng He et al.

· 2026

ADAM combines Anchor extraction, Distribution estimation, Anchor selection, and Query generation to adaptively probe agent memory via an auxiliary generator and entropy based selection. On the EHRAgent benchmark with Llama2-7b-chat, ADAM reaches EQ=77 and ASR=1.00, compared to MEXTRA’s EQ=44 and ASR=0.89.

arXiv:2604.09747 Read explainer

RAG

A Dynamic Retrieval-Augmented Generation System with Selective Memory and Remembrance

Okan Bursa

· 2026

Adaptive RAG Memory (ARM) augments a standard retriever–generator stack with a Dynamic Embedding Layer and Remembrance Engine that track usage statistics and apply selective remembrance and decay to embeddings. On a lightweight retrieval benchmark, ARM achieves NDCG@5 ≈ 0.9401 and Recall@5 = 1.000 with 22M parameters, matching larger baselines like gte-small while providing the best efficiency among ultra-efficient models.

arXiv:2601.02428 Read explainer

RAGLong-Term Memory

HingeMem: Boundary Guided Long-Term Memory with Query Adaptive Retrieval for Scalable Dialogues

Yijie Zhong, Yunfan Gao, Haofen Wang

· 2026

HingeMem combines Boundary Guided Long-Term Memory, Dialogue Boundary Extraction, Memory Construction, Query Adaptive Retrieval, Hyperedge Rerank, and Adaptive Stop to segment dialogues into element-indexed hyperedges and plan query-specific retrieval. On LOCOMO, HingeMem achieves 63.9 overall F1 and 75.1 LLM-as-a-Judge score, surpassing the best baseline Zep (56.9 F1) by 7.0 F1 without using category-specific QA formats.

arXiv:2604.06845 Read explainer

Questions about this paper?

Answers use this explainer on Memory Papers.

Checking…