reproducibilityindex.ai

Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks

Authors: Samy Bengio, Oriol Vinyals, Navdeep Jaitly, Noam Shazeer

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on several sequence prediction tasks show that this approach yields signiﬁcant improvements. Moreover, it was used succesfully in our winning entry to the MSCOCO image captioning challenge, 2015.
Researcher Affiliation	Industry	Samy Bengio, Oriol Vinyals, Navdeep Jaitly, Noam Shazeer Google Research Mountain View, CA, USA {bengio,vinyals,ndjaitly,noam}@google.com
Pseudocode	No	The paper describes the proposed approach verbally and through mathematical equations, and Figure 1 provides an illustration, but no structured pseudocode or algorithm blocks are present.
Open Source Code	No	The paper does not provide any explicit statement about releasing its source code or a link to a code repository for the methodology described.
Open Datasets	Yes	We used the MSCOCO dataset from [19] to train our model. ... We generated data for these experiments using the TIMIT4 corpus and the KALDI toolkit as described in [25].
Dataset Splits	Yes	We trained on 75k images and report results on a separate development set of 5k additional images. ... The training, validation and test sets have 3696, 400 and 192 sequences respectively, and their average length was 304 frames.
Hardware Specification	No	The paper does not specify the exact hardware (e.g., GPU models, CPU types, memory) used for running the experiments. It only mentions model architectures like 'LSTM with one layer of 512 hidden units'.
Software Dependencies	No	The paper mentions using the 'KALDI toolkit' but does not provide specific version numbers for it or any other software dependencies like programming languages or libraries used for implementation.
Experiment Setup	Yes	The recurrent neural network generating words is an LSTM with one layer of 512 hidden units, and the input words are represented by embedding vectors of size 512. The number of words in the dictionary is 8857. We used an inverse sigmoid decay schedule for ϵi for the scheduled sampling approach. ... The trained models had two layers of 250 LSTM cells and a softmax layer, for each of ﬁve conﬁgurations a baseline conﬁguration where the ground truth was always fed to the model, a conﬁguration (Always Sampling) where the model was only fed in its own predictions from the last time step, and three scheduled sampling conﬁgurations (Scheduled Sampling 1-3), where ϵi was ramped linearly from a maximum value to a minimum value over ten epochs and then kept constant at the ﬁnal value.