reproducibilityindex.ai

Batch Policy Gradient Methods for Improving Neural Conversation Models

Authors: Kirthevasan Kandasamy, Yoram Bachrach, Ryota Tomioka, Daniel Tarlow, David Carter

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the efﬁcacy of our method via a series of synthetic experiments and an Amazon Mechanical Turk experiment on a restaurant recommendations dataset.
Researcher Affiliation	Collaboration	Kirthevasan Kandasamy Carnegie Mellon University, Pittsburgh, PA, USA kandasamy@cs.cmu.edu Yoram Bachrach Digital Genius Ltd., London, UK yorambac@gmail.com Ryota Tomioka, Daniel Tarlow, David Carter Microsoft Research, Cambridge, UK {ryoto,dtarlow,dacart}@microsoft.com
Pseudocode	Yes	Algorithm 1 Batch Policy Gradient (BPG) ... Algorithm 2 GTD(λ)
Open Source Code	No	The paper does not contain any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	To convey the main intuitions of our method, we compare our methods against other baselines on a synthetic task on the European parliament proceedings corpus (Koehn, 2005).
Dataset Splits	No	The paper mentions 'cross validation' for hyper-parameter tuning and 'training set' and 'test set' but does not explicitly provide percentages or absolute counts for a dedicated validation set.
Hardware Specification	No	The paper mentions 'GPU parallelisation' but does not specify any particular GPU models, CPU models, or detailed hardware configurations used for the experiments.
Software Dependencies	No	We implement our methods using Chainer (Tokui et al., 2015). The paper mentions Chainer but does not specify a version number or other software dependencies with version numbers.
Experiment Setup	Yes	We found it necessary to use a very small step size (10 5), otherwise the algorithm has a tendency to get stuck at bad parameter values. ... We truncate all output sequences to length 64 and use a maximum batch size of 32. ... We implement our methods using Chainer (Tokui et al., 2015), and group sentences of the same length together in the same batch to make use of GPU parallelisation. ... In both experiments we use deep LSTMs with two layers for the encoder and decoder RNNs. ... The LSTM hidden state size H and word embedding size E for the 4 bots were, (H, E) = (256, 128), (128, 64), (64, 32), (32, 16). ... Bot-1: H = 512, E = 256. BPG: λ = 0.5, GTD(λ) estimator for b V . Bot-2: H = 400, E = 400. BPG: λ = 0.5, constant estimator for b V .