reproducibilityindex.ai

Cold-Start Reinforcement Learning with Softmax Policy Gradient

Authors: Nan Ding, Radu Soricut

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical evidence validates this method on automatic summarization and image captioning tasks. We numerically evaluate our method on two sequence generation benchmarks, a headline-generation task and an image-caption generation task (Section 5).
Researcher Affiliation	Industry	Nan Ding Google Inc. Venice, CA 90291 dingnan@google.com Radu Soricut Google Inc. Venice, CA 90291 rsoricut@google.com
Pseudocode	Yes	The details about the gradient evaluation for the bang-bang rewarded softmax value function are described in Algorithm 1 of the Supplementary Material.
Open Source Code	No	The paper does not provide an unambiguous statement about releasing its source code or a direct link to a repository for the described methodology.
Open Datasets	Yes	In our experiments, the supervised data comes from the English Gigaword [9], and consists of news-articles paired with their headlines. For the image-captioning task, we use the standard MSCOCO dataset [14].
Dataset Splits	Yes	We use a training set of about 6 million article-headline pairs, in addition to two randomly-extracted validation and evaluation sets of 10K examples each. We combine the training and validation datasets for training our model, and hold out a subset of 4K images as our validation set.
Hardware Specification	No	The paper mentions '40 workers' and '10 parameter servers' for computing updates but does not specify any concrete hardware details such as GPU models, CPU types, or memory amounts.
Software Dependencies	Yes	We implemented all the algorithms using Tensor Flow 1.0 [6].
Experiment Setup	Yes	The model is optimized using ADAGRAD with a mini-batch size of 200, a learning rate of 0.01, and gradient clipping with norm equal to 4. We run the training procedure for 10M steps...