Diverse and Faithful Knowledge-Grounded Dialogue Generation via Sequential Posterior Inference
Authors: Yan Xu, Deqian Kong, Dehong Xu, Ziwei Ji, Bo Pang, Pascale Fung, Ying Nian Wu
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results on two common dialogue datasets (Wizard of Wikipedia and Holl-E) demonstrate that SPI outperforms previous strong baselines according to both automatic and human evaluation metrics. |
| Researcher Affiliation | Collaboration | 1Center for Artificial Intelligence Research (CAi RE), The Hong Kong University of Science and Technology, Hong Kong 2Department of Statistics, UCLA, CA, USA 3Salesforce Research. |
| Pseudocode | Yes | Algorithm 1 Learning with Sequential Posterior Inference Algorithm 2 Knowledge-Grounded Response Generation |
| Open Source Code | Yes | The code and checkpoints are available at https: //github.com/deqiankong/SPI. |
| Open Datasets | Yes | Datasets We conduct our experiments on two KGD datasets, Wizard of Wikipedia (Wo W) (Dinan et al., 2019) and Holl-E (Moghe et al., 2018). |
| Dataset Splits | Yes | 22.3k dialogues with 202k turns in Wo W dataset are divided into training, validation, and test subsets. Both validation and test sets consist of seen and unseen sets, where the unseen set consists of the dialogues with the unseen initial topics during the training time. |
| Hardware Specification | Yes | We train our model on NVIDIA Geforce A6000 GPU with 15 epochs and select the best checkpoint with the lowest loss on the validation set as our final model. |
| Software Dependencies | No | The paper mentions 'pre-trained BART-base (Lewis et al., 2020)' and 'Adam optimizer', but it does not specify version numbers for general software dependencies like Python, PyTorch, or other specific libraries required for full reproducibility. |
| Experiment Setup | Yes | We train our model with Adam optimizer with a learning rate of 1e-7 and a weight decay of 0.005. A linear scheduler is utilized to adjust the learning rate for each step. The batch size is set as 32. We train our model on NVIDIA Geforce A6000 GPU with 15 epochs and select the best checkpoint with the lowest loss on the validation set as our final model. The responses are generated using greedy search. We set S as 5 for knowledge selection initialization when using uniform prior. For Langevin dynamics, the number of Langevin steps and step size are 5 and 0.1, respectively. |