Multi-View Feature Representation for Dialogue Generation with Bidirectional Distillation
Authors: Shaoxiong Feng, Xuancheng Ren, Kan Li, Xu Sun12812-12820
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on two high-quality open-domain dialogue datasets, Daily Dialog and Persona Chat, compared with state-of-the-art methods, and provide extensive analysis to examine the effect of the proposed method. |
| Researcher Affiliation | Academia | Shaoxiong Feng,1 Xuancheng Ren,2 Kan Li,1 Xu Sun2,3 1School of Computer Science & Technology, Beijing Institute of Technology 2MOE Key Laboratory of Computational Linguistics, School of EECS, Peking University 3Center for Data Science, Peking University {shaoxiongfeng, likan}@bit.edu.cn, {renxc, xusun}@pku.edu.cn |
| Pseudocode | No | The paper describes its methods using textual explanations and mathematical formulas, but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing open-source code for the described methodology, nor does it provide links to any code repositories or mention supplementary materials containing code. |
| Open Datasets | Yes | We adopt two commonly-used dialogue datasets: Daily Dialog (Li et al. 2017b) and Persona Chat (Zhang et al. 2018a). |
| Dataset Splits | Yes | Finally, the processed dataset contains 50K, 4.5K, and 4.3K pairs for training, validation, and testing, respectively. (Daily Dialog) [...] The processed dataset contains 106K, 13K, and 12.5K pairs for training, validation, and testing. (Persona Chat) |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, processor types, or memory amounts used for running the experiments. |
| Software Dependencies | No | The paper mentions the use of 'Adam optimizer (Kingma and Ba 2015)' but does not provide specific version numbers for any software dependencies, libraries, or frameworks used for the experiments. |
| Experiment Setup | Yes | We set the embedding size to 500, the vocabulary size for both Daily Dialog and Persona Chat to 20K. The dropout probability and the temperature T are 0.1 and 3, respectively. We use Adam optimizer (Kingma and Ba 2015), with a learning rate of 0.0001, gradient clipping at 5.0, and a mini-batch size of 64. [...] We set the number of students to 6 for DML and MRBD. The imitation probability in MRBD is 0.5. The training set is randomly divided into six non-overlapping subsets with the same number of pairs. |