Stylized Dialogue Generation with Multi-Pass Dual Learning

Authors: Jinpeng Li, Yingce Xia, Rui Yan, Hongda Sun, Dongyan Zhao, Tie-Yan Liu

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Evaluation results indicate that our method obtains significant improvement over previous baselines.
Researcher Affiliation Collaboration 1Wangxuan Institute of Computer Technology, Peking University, Beijing, China 2Microsoft Research Asia, Beijing, China 3Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China 4Institute for Artificial Intelligence, Peking University, Beijing, China 5Beijing Academy of Artificial Intelligence, Beijing, China
Pseudocode No The paper describes methods textually but does not include formal pseudocode blocks or algorithms.
Open Source Code Yes Our code and dataset are publicly available at https://github.com/Codebase Li/MPDL
Open Datasets Yes We train and evaluate our method on two benchmarks, TCFC [30], whose responses are of formal style, and Shakespearean Dialogue Generation Corpus (SDGC)2, whose responses are of Shakespearean style. ... For the Dtra, we sample 105.5k informal-formal parallel sentences from GYAFC [31].
Dataset Splits No The paper describes dataset sizes and total samples, but does not specify the explicit percentages or counts for training and validation splits.
Hardware Specification Yes Our model has a 50,257 vocabulary size and was trained on Nvidia GTX1080Ti machines with a batch size 10.
Software Dependencies No The paper mentions 'Dialo GPT weights' and 'Transformers3' but does not provide specific version numbers for software libraries or dependencies.
Experiment Setup Yes Our model has a 50,257 vocabulary size and was trained on Nvidia GTX1080Ti machines with a batch size 10. The maximum input length and maximum output length are set as 45. We choose the Adam optimizer. The learning rate of generators is 2.25 10 4 with warm-up steps 1 103, while that for discriminators is 3 10 4. We use the grid search to tune the hyper-parameters.