Masking Orchestration: Multi-Task Pretraining for Multi-Role Dialogue Representation Learning

Authors: Tianyi Wang, Yating Zhang, Xiaozhong Liu, Changlong Sun, Qiong Zhang9217-9224

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The proposed fine-tuned pretraining mechanism is comprehensively evaluated via three different dialogue datasets along with a number of downstream dialogue-mining tasks. Result shows that the proposed pretraining mechanism significantly contributes to all the downstream tasks without discrimination to different encoders.
Researcher Affiliation Collaboration 1Alibaba Group, Hangzhou, Zhejiang, China 2Indiana University Bloomington, Bloomington, Indiana, USA {will.wty, ranran.zyt, qz.zhang}@alibaba-inc.com liu237@indiana.edu, changlong.scl@taobao.com
Pseudocode No The paper describes the model architecture and pretraining tasks in prose, but does not include formal pseudocode or algorithm blocks.
Open Source Code No To motivate other scholars to investigate this novel but important problem, we make the experiment dataset publicly available. https://github.com/wangtianyiftd/dialogue pretrain (The provided link is for the dataset, not explicitly the source code of the methodology.)
Open Datasets Yes To motivate other scholars to investigate this novel but important problem, we make the experiment dataset publicly available. https://github.com/wangtianyiftd/dialogue pretrain and CSD corpus5 is collected from the customer service center of a top E-commerce platform, which contains over 5 million customer service records between two roles (customer and agent) related to two product categories namely Clothes and Makeup. (Footnote 5 leads to https://sites.google.com/view/nlp-ssa) and EMD corpus is a combined dataset6 consisting of four open English meeting corpus: AMI-Corpus(Goo and Chen 2018), Switchboard Corpus(Jurafsky 2000), MRDA-Corpus(Shriberg et al. 2004) and b Ab I-Tasks-Corpus7. (Footnotes 6 & 7 also link to resources).
Dataset Splits No The paper mentions 'training data' but does not provide specific details on validation splits, percentages, or sample counts (e.g., '80/10/10 split' or '40,000 training samples').
Hardware Specification No The paper does not provide any specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, or cloud computing specifications).
Software Dependencies No The paper mentions using 'Adam Optimization' and 'LSTM cell', but does not specify exact version numbers for any software dependencies, programming languages, or libraries used in the implementation.
Experiment Setup Yes In our experiments, we optimize the tested models using Adam Optimization(Kingma and Ba 2014) with learning rate of 5e-4. The dimensions of word embedding and role embedding are 300 and 100 respectively. The size of hidden layers are all set to 256. We use 2 layer Transformer-Block, where feed-forward filter size is 1024, and the number of heads equals to 4.