reproducibilityindex.ai

CEM: Commonsense-Aware Empathetic Response Generation

Authors: Sahand Sabour, Chujie Zheng, Minlie Huang11229-11237

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach on EMPATHETICDIALOGUES, which is a widely-used benchmark dataset for empathetic response generation. Empirical results demonstrate that our approach outperforms the baseline models in both automatic and human evaluations and can generate more informative and empathetic responses. Also includes sections like 'Experiments Baselines', 'Automatic Evaluation', 'Human Evaluation', 'Ablation Studies'.
Researcher Affiliation	Academia	The Co AI Group, DCST, Institute for Artiﬁcial Intelligence, State Key Lab of Intelligent Technology and Systems, Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing 100084, China
Pseudocode	No	The paper describes the model architecture and processes in text and diagrams, but does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available at https://github.com/Sahandfer/CEM.
Open Datasets	Yes	We conduct our experiments on the EMPATHETICDIALOGUES (Rashkin et al. 2019), a large-scale multi-turn dataset containing 25k empathetic conversations between crowdsourcing workers.
Dataset Splits	Yes	We used the same 8:1:1 train/valid/test split as provided by Rashkin et al. (2019).
Hardware Specification	Yes	All the models were trained on one single TITAN Xp GPU
Software Dependencies	No	The paper mentions 'Py Torch' and other components like 'Glo VE vectors' and 'Adam optimizer' but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	We used 300-dimensional pre-trained Glo VE vectors... The hidden dimension for all corresponding components were set to 300. Adam (Kingma and Ba 2017) optimizer with β1 = 0.9 and β2 = 0.98 was used for training. The initial learning rate was set to 0.0001 and we varied this value during training according to Vaswani et al. (2017). All the models were trained on one single TITAN Xp GPU using a batch size of 16 and early stopping. In our experiments, we set γ1 = 1, γ2 = 1, and γ3 = 1.5.