Learning to Select Knowledge for Response Generation in Dialog Systems
Authors: Rongzhong Lian, Min Xie, Fan Wang, Jinhua Peng, Hua Wu
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on both automatic and human evaluation verify the superiority of our model over previous baselines. ... We conducted experiments on two recently created datasets, namely the Persona-chat dataset [Zhang et al., 2018] and the Wizard-of-Wikipedia dataset [Dinan et al., 2018]. |
| Researcher Affiliation | Collaboration | Rongzhong Lian1, Min Xie2, Fan Wang1, Jinhua Peng1, Hua Wu1 1Baidu Inc., China 2The Hong Kong University of Science and Technology |
| Pseudocode | No | The paper describes the model architecture and components using text and mathematical equations, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Our models and datasets are all available online: https://github.com/ifr2/Post KS. |
| Open Datasets | Yes | We conducted experiments on two recently created datasets, namely the Persona-chat dataset [Zhang et al., 2018] and the Wizard-of-Wikipedia dataset [Dinan et al., 2018]. |
| Dataset Splits | Yes | There are 151,157 turns (each turn corresponds to an utterance and a response pair) of conversations in Persona-chat, which we divide into 122,499 for train, 14,602 for validation and 14,056 for test. ... From this dataset, 79,925 turns of conversations are obtained and 68,931/3,686/7,308 of them are used for train/validation/test. |
| Hardware Specification | Yes | We trained our model with at most 20 epochs on a P40 machine. |
| Software Dependencies | No | The paper mentions 'GloVe' for word embeddings and 'Adam optimizer', but it does not specify version numbers for any software libraries or dependencies (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | Our encoders and decoders have 2-layer GRU structures with 800 hidden states for each layer, but they do not share any parameters. We set the word embedding size to be 300 and initialized it using GloVe [Pennington et al., 2014]. The vocabulary size is 20,000. We used the Adam optimizer with a mini-batch size of 128 and the learning rate is 0.0005. We trained our model with at most 20 epochs on a P40 machine. In the first 5 epochs, we minimize the BOW loss only for pre-training the knowledge manager. In the remaining epochs, we minimize over the sum of all losses. |