Masking Orchestration: Multi-Task Pretraining for Multi-Role Dialogue Representation Learning
Authors: Tianyi Wang, Yating Zhang, Xiaozhong Liu, Changlong Sun, Qiong Zhang9217-9224
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The proposed fine-tuned pretraining mechanism is comprehensively evaluated via three different dialogue datasets along with a number of downstream dialogue-mining tasks. Result shows that the proposed pretraining mechanism significantly contributes to all the downstream tasks without discrimination to different encoders. |
| Researcher Affiliation | Collaboration | 1Alibaba Group, Hangzhou, Zhejiang, China 2Indiana University Bloomington, Bloomington, Indiana, USA {will.wty, ranran.zyt, qz.zhang}@alibaba-inc.com liu237@indiana.edu, changlong.scl@taobao.com |
| Pseudocode | No | The paper describes the model architecture and pretraining tasks in prose, but does not include formal pseudocode or algorithm blocks. |
| Open Source Code | No | To motivate other scholars to investigate this novel but important problem, we make the experiment dataset publicly available. https://github.com/wangtianyiftd/dialogue pretrain (The provided link is for the dataset, not explicitly the source code of the methodology.) |
| Open Datasets | Yes | To motivate other scholars to investigate this novel but important problem, we make the experiment dataset publicly available. https://github.com/wangtianyiftd/dialogue pretrain and CSD corpus5 is collected from the customer service center of a top E-commerce platform, which contains over 5 million customer service records between two roles (customer and agent) related to two product categories namely Clothes and Makeup. (Footnote 5 leads to https://sites.google.com/view/nlp-ssa) and EMD corpus is a combined dataset6 consisting of four open English meeting corpus: AMI-Corpus(Goo and Chen 2018), Switchboard Corpus(Jurafsky 2000), MRDA-Corpus(Shriberg et al. 2004) and b Ab I-Tasks-Corpus7. (Footnotes 6 & 7 also link to resources). |
| Dataset Splits | No | The paper mentions 'training data' but does not provide specific details on validation splits, percentages, or sample counts (e.g., '80/10/10 split' or '40,000 training samples'). |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, or cloud computing specifications). |
| Software Dependencies | No | The paper mentions using 'Adam Optimization' and 'LSTM cell', but does not specify exact version numbers for any software dependencies, programming languages, or libraries used in the implementation. |
| Experiment Setup | Yes | In our experiments, we optimize the tested models using Adam Optimization(Kingma and Ba 2014) with learning rate of 5e-4. The dimensions of word embedding and role embedding are 300 and 100 respectively. The size of hidden layers are all set to 256. We use 2 layer Transformer-Block, where feed-forward filter size is 1024, and the number of heads equals to 4. |