reproducibilityindex.ai

Dialogue Generation: From Imitation Learning to Inverse Reinforcement Learning

Authors: Ziming Li, Julia Kiseleva, Maarten de Rijke6722-6729

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the performance of the resulting model with automatic metrics and human evaluations in two annotation settings. Our experimental results demonstrate that our model can generate more high-quality responses and achieve higher overall performance than the state-of-the-art.
Researcher Affiliation	Collaboration	Ziming Li,1 Julia Kiseleva,1,2 Maarten de Rijke1 1University of Amsterdam 2Microsoft Research AI
Pseudocode	No	The paper includes architectural diagrams (Figure 1 and Figure 2) but no structured pseudocode or algorithm blocks.
Open Source Code	Yes	The source code of this work is available at https://bitbucket.org/Ziming Li/dg-irl-aaai2019
Open Datasets	Yes	The Movie Triples dataset (Serban et al. 2016) has been developed by expanding and preprocessing the Movie-Dic corpus (Banchs 2012) of ﬁlm transcripts and each dialogue consists of 3 turns between two interlocutors.
Dataset Splits	Yes	In the ﬁnal dataset, there are around 157,000 dialogues in the training set, 19,000 in the validation set, and 19,000 in the test set.
Hardware Specification	No	The paper mentions implementing models based on TensorFlow and using pre-trained Word2Vec embeddings but does not specify any hardware details like GPU/CPU models or memory.
Software Dependencies	No	We implement all models based on Tensorﬂow1 except VHRED. ... For all three metrics, we use pre-trained Word2Vec word embeddings trained on the Google News Corpus, which is public access.
Experiment Setup	Yes	We optimize the models using Adam (Kingma and Ba 2014) and the learning rate is initialized as 0.001 except for VHRED. Dropout with probability 0.3 was applied to the GRUs and we apply gradient clipping for both policy models and reward models. We set the beam size to 8 for Monte Carlo search during training and beam search during testing.