Batch Policy Gradient Methods for Improving Neural Conversation Models
Authors: Kirthevasan Kandasamy, Yoram Bachrach, Ryota Tomioka, Daniel Tarlow, David Carter
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the efficacy of our method via a series of synthetic experiments and an Amazon Mechanical Turk experiment on a restaurant recommendations dataset. |
| Researcher Affiliation | Collaboration | Kirthevasan Kandasamy Carnegie Mellon University, Pittsburgh, PA, USA kandasamy@cs.cmu.edu Yoram Bachrach Digital Genius Ltd., London, UK yorambac@gmail.com Ryota Tomioka, Daniel Tarlow, David Carter Microsoft Research, Cambridge, UK {ryoto,dtarlow,dacart}@microsoft.com |
| Pseudocode | Yes | Algorithm 1 Batch Policy Gradient (BPG) ... Algorithm 2 GTD(λ) |
| Open Source Code | No | The paper does not contain any statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | To convey the main intuitions of our method, we compare our methods against other baselines on a synthetic task on the European parliament proceedings corpus (Koehn, 2005). |
| Dataset Splits | No | The paper mentions 'cross validation' for hyper-parameter tuning and 'training set' and 'test set' but does not explicitly provide percentages or absolute counts for a dedicated validation set. |
| Hardware Specification | No | The paper mentions 'GPU parallelisation' but does not specify any particular GPU models, CPU models, or detailed hardware configurations used for the experiments. |
| Software Dependencies | No | We implement our methods using Chainer (Tokui et al., 2015). The paper mentions Chainer but does not specify a version number or other software dependencies with version numbers. |
| Experiment Setup | Yes | We found it necessary to use a very small step size (10 5), otherwise the algorithm has a tendency to get stuck at bad parameter values. ... We truncate all output sequences to length 64 and use a maximum batch size of 32. ... We implement our methods using Chainer (Tokui et al., 2015), and group sentences of the same length together in the same batch to make use of GPU parallelisation. ... In both experiments we use deep LSTMs with two layers for the encoder and decoder RNNs. ... The LSTM hidden state size H and word embedding size E for the 4 bots were, (H, E) = (256, 128), (128, 64), (64, 32), (32, 16). ... Bot-1: H = 512, E = 256. BPG: λ = 0.5, GTD(λ) estimator for b V . Bot-2: H = 400, E = 400. BPG: λ = 0.5, constant estimator for b V . |