DialogueRNN: An Attentive RNN for Emotion Detection in Conversations
https://arxiv.org/pdf/1811.00405.pdf
Navonil Majumder, Soujanya Poria, Devamanyu Hazarika, Rada Mihalcea, Alexander Gelbukh, Erik Cambria
1. Raising a Problem
Current emotion detection systems in conversations,
including the SOTA method, CMN (Conversational Memory Networks, Devamanyu Hazarika's previous research),
1) do not distinguish different parties
2) are not aware of the speaker
2. Proposing their New Solution
1) Assumption
There are 3 major aspects relevant to the emotion in a conversation:
(1) the speaker
(2) the context of the preceding utterances
(3) the emotion of the preceding utterances
2) Model Architecture
DialogueRNN employs 3 GRU(Gated Recurrent Units)
(1) Global GRU
a. Input: all preceding utterances
b. Output: encodes the context
(2) Party GRU
a. Input: previous party state & global GRU's output (= the context)
b. Output: encodes current party state (= current speaker's state & current listener's state)
(3) emotion GRU
a. Input: previous emotion representation & current speaker's state
b. Output: decodes current emotion representation
3) Implementation
(1) get Utterance Representation using CNN (Convolutional Neural Networks)
a. 3 convolution filters (size: 3, 4, 5 / feature-maps: 50)
b. max-pooling
c. ReLU (Rectified Linear Unit): activations are concatenated
d. a 100 dimensional dense layer
* trained at utterance level with the emotion labels
(2) Party GRU
a. Speaker Update: as described above
b. Listener Update: keep the state unchanged
(3) Emotion Classification
a. 2 layer perceptron
b. Softmax layer (6 emotion class)
(4) Training
a. Cost function: categorical cross-entropy + L2 regularization
b. Optimization: stochastic gradient descent based Adam
3. Experiment
1) Datasets
a. IEMOCAP (train: 80%, test: 20%)
b. AVEC (train: 80%, test: 20%
4. Performance of the New Solution
DialogueRNN outperforms existing SOTA contextual emotion classifiers
Party State contributed the most to performance, followed by Emotion GRU
5. Error Analysis
a significant amount of errors occur at turns having a change of emotion from the previous turn
further research on these cases is needed
DialogueRNN: An Attentive RNN for Emotion Detection in Conversations
https://arxiv.org/pdf/1811.00405.pdf
Navonil Majumder, Soujanya Poria, Devamanyu Hazarika, Rada Mihalcea, Alexander Gelbukh, Erik Cambria
1. Raising a Problem
Current emotion detection systems in conversations,
including the SOTA method, CMN (Conversational Memory Networks, Devamanyu Hazarika's previous research),
1) do not distinguish different parties
2) are not aware of the speaker
2. Proposing their New Solution
1) Assumption
There are 3 major aspects relevant to the emotion in a conversation:
(1) the speaker
(2) the context of the preceding utterances
(3) the emotion of the preceding utterances
2) Model Architecture
DialogueRNN employs 3 GRU(Gated Recurrent Units)
(1) Global GRU
a. Input: all preceding utterances
b. Output: encodes the context
(2) Party GRU
a. Input: previous party state & global GRU's output (= the context)
b. Output: encodes current party state (= current speaker's state & current listener's state)
(3) emotion GRU
a. Input: previous emotion representation & current speaker's state
b. Output: decodes current emotion representation
3) Implementation
(1) get Utterance Representation using CNN (Convolutional Neural Networks)
a. 3 convolution filters (size: 3, 4, 5 / feature-maps: 50)
b. max-pooling
c. ReLU (Rectified Linear Unit): activations are concatenated
d. a 100 dimensional dense layer
* trained at utterance level with the emotion labels
(2) Party GRU
a. Speaker Update: as described above
b. Listener Update: keep the state unchanged
(3) Emotion Classification
a. 2 layer perceptron
b. Softmax layer (6 emotion class)
(4) Training
a. Cost function: categorical cross-entropy + L2 regularization
b. Optimization: stochastic gradient descent based Adam
3. Experiment
1) Datasets
a. IEMOCAP (train: 80%, test: 20%)
b. AVEC (train: 80%, test: 20%
4. Performance of the New Solution
DialogueRNN outperforms existing SOTA contextual emotion classifiers
Party State contributed the most to performance, followed by Emotion GRU
5. Error Analysis
a significant amount of errors occur at turns having a change of emotion from the previous turn
further research on these cases is needed