Variational Learning for Unsupervised Knowledge Grounded Dialogs
Authors: Mayank Mishra, Dhiraj Madan, Gaurav Pandey, Danish Contractor
IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Using a collection of three publicly available openconversation datasets, we demonstrate how the posterior distribution, that has information from the ground-truth response, allows for a better approximation of the objective function during training. To overcome the challenges associated with sampling over a large knowledge collection, we develop an efficient approach to approximate the ELBO. To the best of our knowledge we are the first to apply variational training for open-scale unsupervised knowledge grounded dialog systems. |
| Researcher Affiliation | Industry | Mayank Mishra , Dhiraj Madan , Gaurav Pandey and Danish Contractor IBM Research AI mayank.mishra1@ibm.com, {dmadan07, gpandey1}@in.ibm.com, danish.contractor@ibm.com |
| Pseudocode | No | The paper describes the architecture and training process in text but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | We provide the code and the supplementary material at https: //github.com/mayank31398/VRAG and https://arxiv.org/abs/2112. 00653 respectively. |
| Open Datasets | Yes | Using a collection of three publicly available openconversation datasets, OR-Qu AC [Qu et al., 2020], DSTC9 [Kim et al., 2020b], Do QA [Campos et al., 2020] |
| Dataset Splits | No | The paper mentions 'validation sets' and 'early stopping with patience = 5 on recall of the validation sets' but does not specify the size, percentage, or method of creating the validation split. |
| Hardware Specification | No | The paper does not explicitly describe the hardware (e.g., specific GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions software components like 'BERT model', 'GPT2', 'DPR-Multiset model', and 'Adam W Optimizer' but does not specify their version numbers for reproducibility. |
| Experiment Setup | Yes | We initialize our document-prior (for both RAG and VRAG) and document-posterior (for VRAG) networks with the pretrained DPR-Multiset model5 pre-trained using data from the Natural Questions [Kwiatkowski et al., 2019], Trivia QA [Joshi et al., 2017] etc. During the training of the model, it can be difficult to rebuild the document index after every change to the document representation parameters in f, therefore similar to Lewis et al., the parameters in f are kept constant. We used early stopping with patience = 5 on recall of the validation sets to prevent overfitting of models. The loss was optimized using Adam W Optimizer [Loshchilov and Hutter, 2017]. We also found it useful to continue training the response-likelihood for both RAG and VRAG after the joint training is complete. |