Dialogue Generation: From Imitation Learning to Inverse Reinforcement Learning

Authors: Ziming Li, Julia Kiseleva, Maarten de Rijke6722-6729

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the performance of the resulting model with automatic metrics and human evaluations in two annotation settings. Our experimental results demonstrate that our model can generate more high-quality responses and achieve higher overall performance than the state-of-the-art.
Researcher Affiliation Collaboration Ziming Li,1 Julia Kiseleva,1,2 Maarten de Rijke1 1University of Amsterdam 2Microsoft Research AI
Pseudocode No The paper includes architectural diagrams (Figure 1 and Figure 2) but no structured pseudocode or algorithm blocks.
Open Source Code Yes The source code of this work is available at https://bitbucket.org/Ziming Li/dg-irl-aaai2019
Open Datasets Yes The Movie Triples dataset (Serban et al. 2016) has been developed by expanding and preprocessing the Movie-Dic corpus (Banchs 2012) of film transcripts and each dialogue consists of 3 turns between two interlocutors.
Dataset Splits Yes In the final dataset, there are around 157,000 dialogues in the training set, 19,000 in the validation set, and 19,000 in the test set.
Hardware Specification No The paper mentions implementing models based on TensorFlow and using pre-trained Word2Vec embeddings but does not specify any hardware details like GPU/CPU models or memory.
Software Dependencies No We implement all models based on Tensorflow1 except VHRED. ... For all three metrics, we use pre-trained Word2Vec word embeddings trained on the Google News Corpus, which is public access.
Experiment Setup Yes We optimize the models using Adam (Kingma and Ba 2014) and the learning rate is initialized as 0.001 except for VHRED. Dropout with probability 0.3 was applied to the GRUs and we apply gradient clipping for both policy models and reward models. We set the beam size to 8 for Monte Carlo search during training and beam search during testing.