Variational Learning for Unsupervised Knowledge Grounded Dialogs

Authors: Mayank Mishra, Dhiraj Madan, Gaurav Pandey, Danish Contractor

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Using a collection of three publicly available openconversation datasets, we demonstrate how the posterior distribution, that has information from the ground-truth response, allows for a better approximation of the objective function during training. To overcome the challenges associated with sampling over a large knowledge collection, we develop an efficient approach to approximate the ELBO. To the best of our knowledge we are the first to apply variational training for open-scale unsupervised knowledge grounded dialog systems.
Researcher Affiliation Industry Mayank Mishra , Dhiraj Madan , Gaurav Pandey and Danish Contractor IBM Research AI mayank.mishra1@ibm.com, {dmadan07, gpandey1}@in.ibm.com, danish.contractor@ibm.com
Pseudocode No The paper describes the architecture and training process in text but does not include any pseudocode or algorithm blocks.
Open Source Code Yes We provide the code and the supplementary material at https: //github.com/mayank31398/VRAG and https://arxiv.org/abs/2112. 00653 respectively.
Open Datasets Yes Using a collection of three publicly available openconversation datasets, OR-Qu AC [Qu et al., 2020], DSTC9 [Kim et al., 2020b], Do QA [Campos et al., 2020]
Dataset Splits No The paper mentions 'validation sets' and 'early stopping with patience = 5 on recall of the validation sets' but does not specify the size, percentage, or method of creating the validation split.
Hardware Specification No The paper does not explicitly describe the hardware (e.g., specific GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions software components like 'BERT model', 'GPT2', 'DPR-Multiset model', and 'Adam W Optimizer' but does not specify their version numbers for reproducibility.
Experiment Setup Yes We initialize our document-prior (for both RAG and VRAG) and document-posterior (for VRAG) networks with the pretrained DPR-Multiset model5 pre-trained using data from the Natural Questions [Kwiatkowski et al., 2019], Trivia QA [Joshi et al., 2017] etc. During the training of the model, it can be difficult to rebuild the document index after every change to the document representation parameters in f, therefore similar to Lewis et al., the parameters in f are kept constant. We used early stopping with patience = 5 on recall of the validation sets to prevent overfitting of models. The loss was optimized using Adam W Optimizer [Loshchilov and Hutter, 2017]. We also found it useful to continue training the response-likelihood for both RAG and VRAG after the joint training is complete.