Stylized Dialogue Generation with Multi-Pass Dual Learning
Authors: Jinpeng Li, Yingce Xia, Rui Yan, Hongda Sun, Dongyan Zhao, Tie-Yan Liu
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Evaluation results indicate that our method obtains significant improvement over previous baselines. |
| Researcher Affiliation | Collaboration | 1Wangxuan Institute of Computer Technology, Peking University, Beijing, China 2Microsoft Research Asia, Beijing, China 3Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China 4Institute for Artificial Intelligence, Peking University, Beijing, China 5Beijing Academy of Artificial Intelligence, Beijing, China |
| Pseudocode | No | The paper describes methods textually but does not include formal pseudocode blocks or algorithms. |
| Open Source Code | Yes | Our code and dataset are publicly available at https://github.com/Codebase Li/MPDL |
| Open Datasets | Yes | We train and evaluate our method on two benchmarks, TCFC [30], whose responses are of formal style, and Shakespearean Dialogue Generation Corpus (SDGC)2, whose responses are of Shakespearean style. ... For the Dtra, we sample 105.5k informal-formal parallel sentences from GYAFC [31]. |
| Dataset Splits | No | The paper describes dataset sizes and total samples, but does not specify the explicit percentages or counts for training and validation splits. |
| Hardware Specification | Yes | Our model has a 50,257 vocabulary size and was trained on Nvidia GTX1080Ti machines with a batch size 10. |
| Software Dependencies | No | The paper mentions 'Dialo GPT weights' and 'Transformers3' but does not provide specific version numbers for software libraries or dependencies. |
| Experiment Setup | Yes | Our model has a 50,257 vocabulary size and was trained on Nvidia GTX1080Ti machines with a batch size 10. The maximum input length and maximum output length are set as 45. We choose the Adam optimizer. The learning rate of generators is 2.25 10 4 with warm-up steps 1 103, while that for discriminators is 3 10 4. We use the grid search to tune the hyper-parameters. |