Exploiting Persona Information for Diverse Generation of Conversational Responses
Authors: Haoyu Song, Wei-Nan Zhang, Yiming Cui, Dong Wang, Ting Liu
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the proposed model on a benchmark persona-chat dataset. Both automatic and human evaluations show that our model can deliver more diverse and more engaging persona-based responses than baseline approaches. Experimental results show that our model can generate persona-based responses as well as deliver more diverse and more engaging responses than baselines (Section 4). |
| Researcher Affiliation | Collaboration | 1Research Center for Social Computing and Information Retrieval, Harbin Institute of Technology, China 2Peng Cheng Laboratory, Shenzhen, China 3Joint Laboratory of HIT and i FLYTEK (HFL), i FLYTEK Research, Beijing, China |
| Pseudocode | No | The paper describes the model components and processes in text and diagrams, but does not present any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code available at: https://github.com/vsharecodes/percvae . |
| Open Datasets | Yes | We perform experiments on the recently released Conv AI2 benchmark dataset, which is an extended version (with a new test set) of persona-chat dataset [Zhang et al., 2018]. |
| Dataset Splits | Yes | We set aside 800 dialogues together with its profile texts from the training set for validation. The final data have 9,181/800/1,016 dialogues for train/validate/test. |
| Hardware Specification | No | The paper specifies model architecture details like 'RNN is two-layer GRU with a 500-dimensional hidden state' and discusses the use of 'latent variable based models' but does not provide specific hardware details (e.g., CPU, GPU models, or memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions optimization techniques like 'Adam optimizer' and model components like 'two-layer GRU' but does not specify any software libraries or their version numbers (e.g., Python, PyTorch/TensorFlow versions) used for implementation. |
| Experiment Setup | Yes | In our experiments, the RNN is two-layer GRU with a 500-dimensional hidden state. The dimension of word embedding is set to 300, and thus the persona memory size is also 300. The vocabulary size is limited to 20,000. The latent variable size is set to 100. KL annealing steps are set to 10,000. We train the model with a minibatch size of 32 and use Adam optimizer with an initial learning rate of 0.001. |