reproducibilityindex.ai

Filling the Gap of Utterance-aware and Speaker-aware Representation for Multi-turn Dialogue

Authors: Longxiang Liu, Zhuosheng Zhang, Hai Zhao, Xi Zhou, Xiang Zhou13406-13414

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that our method boosts the strong ELECTRA baseline substantially in four public benchmark datasets, and achieves various new state-of-the-art performance over previous methods. A series of ablation studies are conducted to demonstrate the effectiveness of our method.
Researcher Affiliation	Collaboration	1Department of Computer Science and Engineering, Shanghai Jiao Tong University 2Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, Shanghai, China 3Mo E Key Lab of Artiﬁcial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China 4Cloud Walk Technology, Shanghai, China com prehensive@sjtu.edu.cn, zhangzs@sjtu.edu.cn, zhaohai@cs.sjtu.edu.cn, zhouxi@cloudwalk.cn, zhouxiang@cloudwalk.cn
Pseudocode	No	The paper provides architectural diagrams (Figure 1 and Figure 2) but no explicit pseudocode or algorithm blocks.
Open Source Code	Yes	The implementation of our MDFN model is available at https: //github.com/comprehensive Map/MDFN
Open Datasets	Yes	We tested our model on two English datasets: Ubuntu Dialogue Corpus (Ubuntu) and Multi-Turn Dialogue Reasoning (Mu Tual), and two Chinese datasets: Douban Conversation Corpus (Douban) and E-commerce Dialogue Corpus (ECD). ... Ubuntu (Lowe et al. 2015) ... Douban (Wu et al. 2017) ... ECD (Zhang et al. 2018) ... Mu Tual (Cui et al. 2020)
Dataset Splits	Yes	Ubuntu ... 0.5 million for validation and 0.5 million for testing. ... ECD ... 1 million context-response pairs in training set, 0.5 million in validation set and 0.5 million in test set.
Hardware Specification	No	The paper does not provide specific hardware details such as CPU or GPU models, memory, or cloud instance types used for the experiments.
Software Dependencies	No	The paper mentions 'Pytorch' and 'Transformer Library' but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	For the sake of computational efﬁciency, the maximum number of utterances is specialized as 20. The concatenated context, response, [CLS] and [SEP] in one sample is truncated according to the longest ﬁrst rule or padded to a certain length, which is 256 for Mu Tual and 384 for the other three datasets. Our model is implemented using Pytorch and based on the Transformer Library. We use ELECTRA (Clark et al. 2019) as our backbone model in this work. Adam W (Loshchilov and Hutter 2018) is used as our optimizer. The batch size is 24 for Mu Tual, and 64 for others. The initial learning rate is 4 10 6 for Mu Tual and 3 10 6 for others. We run 3 epochs for Mu Tual and 2 epochs for others and select the model that achieves the best result in validation. For the English tasks, we use the pre-trained weights electra-largediscriminator for ﬁne-tuning; for the Chinese tasks, the weights are from hﬂ/chinese-electra-large-discriminator.