A Knowledge-Grounded Neural Conversation Model
Authors: Marjan Ghazvininejad, Chris Brockett, Ming-Wei Chang, Bill Dolan, Jianfeng Gao, Wen-tau Yih, Michel Galley
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our approach yields significant improvements over a competitive SEQ2SEQ baseline. Human judges found that our outputs are significantly more informative.Using this framework, we have trained systems at a large scale using 23M general-domain conversations from Twitter and 1.1M Foursquare tips, showing significant improvements in terms of informativeness (human evaluation) over a competitive large-scale SEQ2SEQ model baseline. |
| Researcher Affiliation | Collaboration | Marjan Ghazvininejad,1 Chris Brockett,2 Ming-Wei Chang,2 Bill Dolan,2 Jianfeng Gao,2 Wen-tau Yih,2 Michel Galley2 1Information Sciences Institute, USC 2Microsoft ghazvini@isi.edu, mgalley@microsoft.com |
| Pseudocode | No | The paper describes its model architecture and components (e.g., Dialog Encoder and Decoder, Facts Encoder) but does not include any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any statement or link indicating the release of source code for the described methodology. |
| Open Datasets | No | We collected a 23M general dataset of 3-turn conversations. This serves as a background dataset not associated with facts, and its massive size is key to learning the conversational structure or backbone.We extracted from the web 1.1M tips relating to establishments in North America.The paper describes how they collected and processed these datasets from public platforms, but it does not provide concrete access information (e.g., URL, DOI, or formal citation with author/year for their specific constructed dataset) to the actual datasets used in their experiments. |
| Dataset Splits | Yes | Crowdsourced human judges were then presented with these 10K sampled conversations and asked to determine whether the response contained actionable information, i.e., did they contain information that would permit the respondents to decide, e.g., whether or not they should patronize an establishment. From this, we selected the topranked 4k conversations to be held out as validation set and test set; these were removed from our training data. |
| Hardware Specification | No | The paper describes the model architecture and size (e.g., '2-layer GRU models with 512 hidden cells'), but does not provide any specific hardware details such as GPU models, CPU types, or memory used for training or experiments. |
| Software Dependencies | No | The paper mentions using specific algorithms and optimizers ('GRU models', 'Adam optimizer'), but does not provide version numbers for any programming languages, libraries, or frameworks (e.g., Python version, TensorFlow/PyTorch version). |
| Experiment Setup | Yes | More specifically, we used 2-layer GRU models with 512 hidden cells for each layer for encoder and decoder, the dimensionality of word embeddings is set to 512, and the size of input/output memory representation is 1024. We used the Adam optimizer with a fixed learning rate of 0.1. Batch size is set to 128. All parameters are initialized from a uniform distribution in [ 3/d], where d is the dimension of the parameter. Gradients are clipped at 5 to avoid gradient explosion. |