Emotional Chatting Machine: Emotional Conversation Generation with Internal and External Memory

AuthorsHao Zhou, Minlie Huang, Tianyang Zhang et al.

arXiv 20172017

TL;DR

Emotional Chatting Machine (ECM) adds emotion category embedding plus internal and external emotion memories to seq2seq, boosting emotion accuracy to 0.773 vs 0.724 for Emb (+0.049).

SharePost on X LinkedIn

Read our summary here, or open the publisher PDF on the next tab.

THE PROBLEM

Neural chatbots ignore emotion and produce vague responses

ECM targets the issue that simply embedding emotion information yields 73.7% emotional responses that are generic and hard to perceive as emotional.

This hurts open domain dialogue systems, where seq2seq chatbots generate content fluent replies but fail to express clear, consistent emotion or empathy aligned with user affect.

HOW IT WORKS

Emotional Chatting Machine with internal and external memory

ECM combines Emotion Category Embedding, Internal Memory, and External Memory inside a GRU encoder decoder to control emotional expression over time.

You can think of Internal Memory as a decaying emotional state like human mood, and External Memory as a separate shelf of explicit emotion words such as lovely or awesome.

This design lets ECM decide when to prioritize emotional words versus generic words, enabling emotion controllable responses that a plain context window seq2seq cannot reliably produce.

DIAGRAM

Decoding flow with internal and external emotion memories

This diagram shows how ECM decodes each word using the internal emotion state and the external emotion vocabulary at inference time.

DIAGRAM

Data preparation and training pipeline for ECM

This diagram shows how ECM builds an emotion labelled corpus using Bi LSTM and then trains the emotional seq2seq model.

PROCESS

How ECM Handles Emotional Conversation Generation

01
Encoder decoder framework
ECM first uses the GRU based encoder decoder framework to map the post sequence X into hidden representations and initialize decoding states.
02
Emotion Category Embedding
ECM embeds the given emotion category e into vector ve and concatenates ve with ct and e(yt−1) at every decoder step.
03
Internal Memory
ECM maintains an internal emotion state Me,t, reading with gate gr_t and decaying it with write gate gw_t so emotion gradually fades by the final step.
04
External Memory
ECM uses a type selector αt to mix generic softmax Pg and emotion softmax Pe over disjoint vocabularies, explicitly choosing emotion or generic words.

KEY CONTRIBUTIONS

Key Contributions

01
Emotional Chatting Machine framework
ECM introduces an end to end seq2seq framework with Emotion Category Embedding, Internal Memory, and External Memory to generate emotionally consistent responses from large scale data.
02
Emotion Category Embedding mechanism
ECM embeds six emotion categories Angry, Disgust, Happy, Like, Sad, Other into 100 dimensional vectors and feeds them into every decoder step to control high level emotional style.
03
Internal and External Memory modules
ECM adds an internal emotion decay mechanism and an external emotion vocabulary selector, improving emotion accuracy from 0.724 for Emb to 0.773 while keeping perplexity near 65.9.

RESULTS

By the Numbers

Perplexity

65.9

-2.1 vs Seq2Seq perplexity 68.0

Accuracy

0.773

+0.049 over Emb emotion accuracy 0.724

Seq2Seq Accuracy

0.179

-0.594 vs ECM emotion accuracy 0.773

Manual Content Score

1.299

+0.044 over Emb content score 1.256

On the Emotional STC dataset, which annotates STC conversations with six emotion categories, ECM improves emotion controllability while keeping language quality close to baselines. These results show ECM can generate responses that are both emotionally aligned and grammatically fluent.

BENCHMARK

By the Numbers

BENCHMARK

Objective evaluation with perplexity and accuracy

Emotion accuracy of ECM and baselines on the Emotional STC dataset.

KEY INSIGHT

The Counterintuitive Finding

ECM without the external memory achieves the best perplexity 61.8, yet full ECM has higher emotion accuracy 0.773 than 0.731 for w o EMem.

This is surprising because adding an extra decision head and constrained emotion vocabulary slightly hurts perplexity, but still yields better judged content quality and much clearer emotional expression.

WHY IT MATTERS

What this unlocks for the field

ECM shows that explicit internal and external emotion memories let neural chatbots control emotional style without sacrificing grammaticality.

Builders can now design assistants that respond as sympathetic, angry, or cheerful on demand, enabling empathetic agents and controllable affective dialogue in large scale neural systems.

~12 min read← Back to papers

Related papers

Memory Architecture

A Control Architecture for Training-Free Memory Use

Yanzhen Lu, Muchen Jiang et al.

· 2026

TAG routes low-confidence steps to uncertainty-based routing, filters them with guarded acceptance with rollback, chooses between bank selection across rule and exemplar memory, and prunes via evidence-based retirement inside a unified control loop. On SVAMP and ASDiv, TAG reaches 81.0% and 85.2% accuracy, improving over the 74.0% and 77.5% no-memory baselines while a compute-matched Retry baseline stays flat.

arXiv:2604.18206 Read explainer

RAGBenchmarkAgent MemoryMemory Architecture

ADAM: A Systematic Data Extraction Attack on Agent Memory via Adaptive Querying

Xingyu Lyu, Jianfeng He et al.

· 2026

ADAM combines Anchor extraction, Distribution estimation, Anchor selection, and Query generation to adaptively probe agent memory via an auxiliary generator and entropy based selection. On the EHRAgent benchmark with Llama2-7b-chat, ADAM reaches EQ=77 and ASR=1.00, compared to MEXTRA’s EQ=44 and ASR=0.89.

arXiv:2604.09747 Read explainer

BenchmarkMemory Architecture

AdaMem: Adaptive User-Centric Memory for Long-Horizon Dialogue Agents

Shannan Yan, Jingchen Ni et al.

· 2026

AdaMem organizes dialogue history into Working Memory, Episodic Memory, Persona Memory, and Graph Memory coordinated by a Memory Agent, Research Agent, and Working Agent. On LoCoMo with GPT-4.1-mini, AdaMem achieves 44.65 F1 overall, beating the best baseline LangMem at 41.76 F1 by +2.89.

arXiv:2603.16496 Read explainer

Questions about this paper?

Answers use this explainer on Memory Papers.

Checking…