An Actor-Critic Algorithm for Sequence Prediction
Authors: Dzmitry Bahdanau, Philemon Brakel, Kelvin Xu, Anirudh Goyal, Ryan Lowe, Joelle Pineau, Aaron Courville, Yoshua Bengio
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that our method leads to improved performance on both a synthetic task, and for German-English machine translation. Our analysis paves the way for such methods to be applied in natural language generation tasks, such as machine translation, caption generation, and dialogue modelling. |
| Researcher Affiliation | Academia | Dzmitry Bahdanau Philemon Brakel Kelvin Xu Anirudh Goyal Universit e de Montr eal Ryan Lowe Joelle Pineau Mc Gill University Aaron Courville Universit e de Montr eal Yoshua Bengio Universit e de Montr eal |
| Pseudocode | Yes | Algorithm 1 Actor-Critic Training for Sequence Prediction |
| Open Source Code | Yes | The source code is available at https://github.com/rizar/actor-critic-public |
| Open Datasets | Yes | We use text from the One Billion Word dataset for the spelling correction task (Chelba et al., 2013), which has pre-defined training and testing sets. [...] For our first translation experiment, we use data from the German-English machine translation track of the IWSLT 2014 evaluation campaign (Cettolo et al., 2014), as used in Ranzato et al. (2015), and closely follow the pre-processing described in that work. [...] In addition we considered a larger WMT14 English-French dataset Cho et al. (2014) with more than 12 million examples. |
| Dataset Splits | Yes | For the IWSLT 2014 data the sizes of validation and tests set were 6,969 and 6,750, respectively. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. |
| Software Dependencies | Yes | We thank the developers of Theano (Theano Development Team, 2016) and Blocks (van Merri enboer et al., 2015) for their great work. |
| Experiment Setup | Yes | We use the ADAM optimizer (Kingma & Ba, 2015) to train all the networks with the parameters recommended in the original paper, with the exception of the scale parameter α. The latter is first set to 10^-3 and then annealed to 10^-4 for log-likelihood training. For the pre-training stage of the actor-critic, we use α = 10^-3 and decrease it to 10^-4 for the joint actor-critic training. We used M = 1 sample for both actor-critic and REINFORCE. For exact hyperparameter settings we refer the reader to Appendix A. |