reproducibilityindex.ai

A Pre-Training Based Personalized Dialogue Generation Model with Persona-Sparse Data

Authors: Yinhe Zheng, Rongsheng Zhang, Minlie Huang, Xiaoxi Mao9693-9700

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Both automatic and manual evaluation demonstrates that the proposed model outperforms state-of-the-art methods for generating more coherent and persona consistent responses with persona-sparse data.
Researcher Affiliation	Collaboration	1Institute for Artiﬁcal Intelligence, State Key Lab of Intelligent Technology and Systems. Beijing National Research Center for Information Science and Technology. Department of Computer Science and Technology, Tsinghua University, Beijing, China. 2Fuxi AI Lab, Net Ease Inc., Hangzhou, China. 3Samsung Research China Beijing (SRC-B), Beijing, China.
Pseudocode	No	The paper describes the model architecture and processes using text and diagrams, but does not include structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any concrete access information (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described.
Open Datasets	Yes	The dialogue data used in this study were sampled from the Personal Dialog dataset (Zheng et al. 2019), which were collected from a Chinese social media Weibo.
Dataset Splits	Yes	We randomly sampled 10K sessions of dialogues as the validation set, and constructed two test sets, i.e., a random test set and a biased test set, to test the behavior of our model in different contexts.
Hardware Specification	No	The paper does not specify any particular hardware components such as GPU models, CPU types, or memory amounts used for conducting the experiments. It only implies computation without detailing the machines.
Software Dependencies	No	The paper mentions using frameworks like Transformer and pre-trained models such as BERT and GPT2 but does not specify the version numbers of any software dependencies (e.g., Python, PyTorch, TensorFlow, or specific libraries with their versions) required for reproduction.
Experiment Setup	Yes	The encoder and decoder contained 12 Transformer blocks, and 12 attention heads were used. Token embeddings had the size of 768 and the context window was of size 512. The dynamic weight predictor was implemented as a multilayer perceptron after an average pooling layer on EC. The value of λ1 and λ2 in Eq. 9 was set to 0.2 and 0.5, respectively. The pretraining stage lasted for 70 epochs, and we ﬁne-tuned our model for another 30 epochs.