Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks
Authors: Samy Bengio, Oriol Vinyals, Navdeep Jaitly, Noam Shazeer
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on several sequence prediction tasks show that this approach yields significant improvements. Moreover, it was used succesfully in our winning entry to the MSCOCO image captioning challenge, 2015. |
| Researcher Affiliation | Industry | Samy Bengio, Oriol Vinyals, Navdeep Jaitly, Noam Shazeer Google Research Mountain View, CA, USA {bengio,vinyals,ndjaitly,noam}@google.com |
| Pseudocode | No | The paper describes the proposed approach verbally and through mathematical equations, and Figure 1 provides an illustration, but no structured pseudocode or algorithm blocks are present. |
| Open Source Code | No | The paper does not provide any explicit statement about releasing its source code or a link to a code repository for the methodology described. |
| Open Datasets | Yes | We used the MSCOCO dataset from [19] to train our model. ... We generated data for these experiments using the TIMIT4 corpus and the KALDI toolkit as described in [25]. |
| Dataset Splits | Yes | We trained on 75k images and report results on a separate development set of 5k additional images. ... The training, validation and test sets have 3696, 400 and 192 sequences respectively, and their average length was 304 frames. |
| Hardware Specification | No | The paper does not specify the exact hardware (e.g., GPU models, CPU types, memory) used for running the experiments. It only mentions model architectures like 'LSTM with one layer of 512 hidden units'. |
| Software Dependencies | No | The paper mentions using the 'KALDI toolkit' but does not provide specific version numbers for it or any other software dependencies like programming languages or libraries used for implementation. |
| Experiment Setup | Yes | The recurrent neural network generating words is an LSTM with one layer of 512 hidden units, and the input words are represented by embedding vectors of size 512. The number of words in the dictionary is 8857. We used an inverse sigmoid decay schedule for ϵi for the scheduled sampling approach. ... The trained models had two layers of 250 LSTM cells and a softmax layer, for each of five configurations a baseline configuration where the ground truth was always fed to the model, a configuration (Always Sampling) where the model was only fed in its own predictions from the last time step, and three scheduled sampling configurations (Scheduled Sampling 1-3), where ϵi was ramped linearly from a maximum value to a minimum value over ten epochs and then kept constant at the final value. |