A Pre-Training Based Personalized Dialogue Generation Model with Persona-Sparse Data
Authors: Yinhe Zheng, Rongsheng Zhang, Minlie Huang, Xiaoxi Mao9693-9700
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Both automatic and manual evaluation demonstrates that the proposed model outperforms state-of-the-art methods for generating more coherent and persona consistent responses with persona-sparse data. |
| Researcher Affiliation | Collaboration | 1Institute for Artifical Intelligence, State Key Lab of Intelligent Technology and Systems. Beijing National Research Center for Information Science and Technology. Department of Computer Science and Technology, Tsinghua University, Beijing, China. 2Fuxi AI Lab, Net Ease Inc., Hangzhou, China. 3Samsung Research China Beijing (SRC-B), Beijing, China. |
| Pseudocode | No | The paper describes the model architecture and processes using text and diagrams, but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access information (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described. |
| Open Datasets | Yes | The dialogue data used in this study were sampled from the Personal Dialog dataset (Zheng et al. 2019), which were collected from a Chinese social media Weibo. |
| Dataset Splits | Yes | We randomly sampled 10K sessions of dialogues as the validation set, and constructed two test sets, i.e., a random test set and a biased test set, to test the behavior of our model in different contexts. |
| Hardware Specification | No | The paper does not specify any particular hardware components such as GPU models, CPU types, or memory amounts used for conducting the experiments. It only implies computation without detailing the machines. |
| Software Dependencies | No | The paper mentions using frameworks like Transformer and pre-trained models such as BERT and GPT2 but does not specify the version numbers of any software dependencies (e.g., Python, PyTorch, TensorFlow, or specific libraries with their versions) required for reproduction. |
| Experiment Setup | Yes | The encoder and decoder contained 12 Transformer blocks, and 12 attention heads were used. Token embeddings had the size of 768 and the context window was of size 512. The dynamic weight predictor was implemented as a multilayer perceptron after an average pooling layer on EC. The value of λ1 and λ2 in Eq. 9 was set to 0.2 and 0.5, respectively. The pretraining stage lasted for 70 epochs, and we fine-tuned our model for another 30 epochs. |