Diverse and Faithful Knowledge-Grounded Dialogue Generation via Sequential Posterior Inference

Authors: Yan Xu, Deqian Kong, Dehong Xu, Ziwei Ji, Bo Pang, Pascale Fung, Ying Nian Wu

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results on two common dialogue datasets (Wizard of Wikipedia and Holl-E) demonstrate that SPI outperforms previous strong baselines according to both automatic and human evaluation metrics.
Researcher Affiliation Collaboration 1Center for Artificial Intelligence Research (CAi RE), The Hong Kong University of Science and Technology, Hong Kong 2Department of Statistics, UCLA, CA, USA 3Salesforce Research.
Pseudocode Yes Algorithm 1 Learning with Sequential Posterior Inference Algorithm 2 Knowledge-Grounded Response Generation
Open Source Code Yes The code and checkpoints are available at https: //github.com/deqiankong/SPI.
Open Datasets Yes Datasets We conduct our experiments on two KGD datasets, Wizard of Wikipedia (Wo W) (Dinan et al., 2019) and Holl-E (Moghe et al., 2018).
Dataset Splits Yes 22.3k dialogues with 202k turns in Wo W dataset are divided into training, validation, and test subsets. Both validation and test sets consist of seen and unseen sets, where the unseen set consists of the dialogues with the unseen initial topics during the training time.
Hardware Specification Yes We train our model on NVIDIA Geforce A6000 GPU with 15 epochs and select the best checkpoint with the lowest loss on the validation set as our final model.
Software Dependencies No The paper mentions 'pre-trained BART-base (Lewis et al., 2020)' and 'Adam optimizer', but it does not specify version numbers for general software dependencies like Python, PyTorch, or other specific libraries required for full reproducibility.
Experiment Setup Yes We train our model with Adam optimizer with a learning rate of 1e-7 and a weight decay of 0.005. A linear scheduler is utilized to adjust the learning rate for each step. The batch size is set as 32. We train our model on NVIDIA Geforce A6000 GPU with 15 epochs and select the best checkpoint with the lowest loss on the validation set as our final model. The responses are generated using greedy search. We set S as 5 for knowledge selection initialization when using uniform prior. For Langevin dynamics, the number of Langevin steps and step size are 5 and 0.1, respectively.