reproducibilityindex.ai

Diverse and Faithful Knowledge-Grounded Dialogue Generation via Sequential Posterior Inference

Authors: Yan Xu, Deqian Kong, Dehong Xu, Ziwei Ji, Bo Pang, Pascale Fung, Ying Nian Wu

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental results on two common dialogue datasets (Wizard of Wikipedia and Holl-E) demonstrate that SPI outperforms previous strong baselines according to both automatic and human evaluation metrics.
Researcher Affiliation	Collaboration	1Center for Artificial Intelligence Research (CAi RE), The Hong Kong University of Science and Technology, Hong Kong 2Department of Statistics, UCLA, CA, USA 3Salesforce Research.
Pseudocode	Yes	Algorithm 1 Learning with Sequential Posterior Inference Algorithm 2 Knowledge-Grounded Response Generation
Open Source Code	Yes	The code and checkpoints are available at https: //github.com/deqiankong/SPI.
Open Datasets	Yes	Datasets We conduct our experiments on two KGD datasets, Wizard of Wikipedia (Wo W) (Dinan et al., 2019) and Holl-E (Moghe et al., 2018).
Dataset Splits	Yes	22.3k dialogues with 202k turns in Wo W dataset are divided into training, validation, and test subsets. Both validation and test sets consist of seen and unseen sets, where the unseen set consists of the dialogues with the unseen initial topics during the training time.
Hardware Specification	Yes	We train our model on NVIDIA Geforce A6000 GPU with 15 epochs and select the best checkpoint with the lowest loss on the validation set as our final model.
Software Dependencies	No	The paper mentions 'pre-trained BART-base (Lewis et al., 2020)' and 'Adam optimizer', but it does not specify version numbers for general software dependencies like Python, PyTorch, or other specific libraries required for full reproducibility.
Experiment Setup	Yes	We train our model with Adam optimizer with a learning rate of 1e-7 and a weight decay of 0.005. A linear scheduler is utilized to adjust the learning rate for each step. The batch size is set as 32. We train our model on NVIDIA Geforce A6000 GPU with 15 epochs and select the best checkpoint with the lowest loss on the validation set as our final model. The responses are generated using greedy search. We set S as 5 for knowledge selection initialization when using uniform prior. For Langevin dynamics, the number of Langevin steps and step size are 5 and 0.1, respectively.