Dialogue Generation: From Imitation Learning to Inverse Reinforcement Learning
Authors: Ziming Li, Julia Kiseleva, Maarten de Rijke6722-6729
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the performance of the resulting model with automatic metrics and human evaluations in two annotation settings. Our experimental results demonstrate that our model can generate more high-quality responses and achieve higher overall performance than the state-of-the-art. |
| Researcher Affiliation | Collaboration | Ziming Li,1 Julia Kiseleva,1,2 Maarten de Rijke1 1University of Amsterdam 2Microsoft Research AI |
| Pseudocode | No | The paper includes architectural diagrams (Figure 1 and Figure 2) but no structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The source code of this work is available at https://bitbucket.org/Ziming Li/dg-irl-aaai2019 |
| Open Datasets | Yes | The Movie Triples dataset (Serban et al. 2016) has been developed by expanding and preprocessing the Movie-Dic corpus (Banchs 2012) of film transcripts and each dialogue consists of 3 turns between two interlocutors. |
| Dataset Splits | Yes | In the final dataset, there are around 157,000 dialogues in the training set, 19,000 in the validation set, and 19,000 in the test set. |
| Hardware Specification | No | The paper mentions implementing models based on TensorFlow and using pre-trained Word2Vec embeddings but does not specify any hardware details like GPU/CPU models or memory. |
| Software Dependencies | No | We implement all models based on Tensorflow1 except VHRED. ... For all three metrics, we use pre-trained Word2Vec word embeddings trained on the Google News Corpus, which is public access. |
| Experiment Setup | Yes | We optimize the models using Adam (Kingma and Ba 2014) and the learning rate is initialized as 0.001 except for VHRED. Dropout with probability 0.3 was applied to the GRUs and we apply gradient clipping for both policy models and reward models. We set the beam size to 8 for Monte Carlo search during training and beam search during testing. |