Filling the Gap of Utterance-aware and Speaker-aware Representation for Multi-turn Dialogue

Authors: Longxiang Liu, Zhuosheng Zhang, Hai Zhao, Xi Zhou, Xiang Zhou13406-13414

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that our method boosts the strong ELECTRA baseline substantially in four public benchmark datasets, and achieves various new state-of-the-art performance over previous methods. A series of ablation studies are conducted to demonstrate the effectiveness of our method.
Researcher Affiliation Collaboration 1Department of Computer Science and Engineering, Shanghai Jiao Tong University 2Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, Shanghai, China 3Mo E Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China 4Cloud Walk Technology, Shanghai, China com prehensive@sjtu.edu.cn, zhangzs@sjtu.edu.cn, zhaohai@cs.sjtu.edu.cn, zhouxi@cloudwalk.cn, zhouxiang@cloudwalk.cn
Pseudocode No The paper provides architectural diagrams (Figure 1 and Figure 2) but no explicit pseudocode or algorithm blocks.
Open Source Code Yes The implementation of our MDFN model is available at https: //github.com/comprehensive Map/MDFN
Open Datasets Yes We tested our model on two English datasets: Ubuntu Dialogue Corpus (Ubuntu) and Multi-Turn Dialogue Reasoning (Mu Tual), and two Chinese datasets: Douban Conversation Corpus (Douban) and E-commerce Dialogue Corpus (ECD). ... Ubuntu (Lowe et al. 2015) ... Douban (Wu et al. 2017) ... ECD (Zhang et al. 2018) ... Mu Tual (Cui et al. 2020)
Dataset Splits Yes Ubuntu ... 0.5 million for validation and 0.5 million for testing. ... ECD ... 1 million context-response pairs in training set, 0.5 million in validation set and 0.5 million in test set.
Hardware Specification No The paper does not provide specific hardware details such as CPU or GPU models, memory, or cloud instance types used for the experiments.
Software Dependencies No The paper mentions 'Pytorch' and 'Transformer Library' but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes For the sake of computational efficiency, the maximum number of utterances is specialized as 20. The concatenated context, response, [CLS] and [SEP] in one sample is truncated according to the longest first rule or padded to a certain length, which is 256 for Mu Tual and 384 for the other three datasets. Our model is implemented using Pytorch and based on the Transformer Library. We use ELECTRA (Clark et al. 2019) as our backbone model in this work. Adam W (Loshchilov and Hutter 2018) is used as our optimizer. The batch size is 24 for Mu Tual, and 64 for others. The initial learning rate is 4 10 6 for Mu Tual and 3 10 6 for others. We run 3 epochs for Mu Tual and 2 epochs for others and select the model that achieves the best result in validation. For the English tasks, we use the pre-trained weights electra-largediscriminator for fine-tuning; for the Chinese tasks, the weights are from hfl/chinese-electra-large-discriminator.