Middle-Out Decoding

Authors: Shikib Mehri, Leonid Sigal

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the aforementioned models on two sequence generation tasks. First, we evaluate middleout decoders on the synthetic problem of de-noising a symmetric sequence. Next, we explore the problem of video captioning on the MSVD dataset (Chen and Dolan, 2011), evaluating our models for quality, diversity, and control.
Researcher Affiliation Academia Shikib Mehri Department of Computer Science University of British Columbia amehri@cs.cmu.edu Leonid Sigal Department of Computer Science University of British Columbia lsigal@cs.ubc.ca
Pseudocode No The paper describes the model architecture and training procedures in text and with diagrams, but it does not include formal pseudocode or algorithm blocks.
Open Source Code No The paper does not contain an explicit statement about releasing the source code or a link to a code repository for the described methodology.
Open Datasets Yes We utilize frame-level features provided by Pasunuru and Bansal (2017). The videos were sampled at 3fps and passed through an Inception-v4 model (Szegedy et al., 2017), pretrained on Image Net (Deng et al., 2009), to obtain 1536-dim feature vector for each frame. For this task, we utilize the MSVD (Youtube2Text) dataset (Chen and Dolan, 2011)
Dataset Splits Yes We use the standard splits provided by Venugopalan et al. (2015a) with 1200 training videos, 100 for validation and 670 for testing.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used to run its experiments.
Software Dependencies No The paper mentions software components like "LSTMs", "Adam optimizer", "word2vec", and "Inception-v4 model" but does not specify their version numbers, which is required for reproducible software dependencies.
Experiment Setup Yes We utilize 100-dimensional LSTMs and the Adam optimizer (Kingma and Ba, 2015) with a learning rate of 1e 4. We train the models for 20, 000 steps with a batch size of 32. For all of our models, we use a 1024-dimensional LSTMs, 512-dimensional embeddings... and the Adam optimizer with a learning rate of 1e 4. We utilize a batch size of 32 and train for 15 epochs. We employ a scheduled sampling training strategy (Bengio et al., 2015), which has greatly improved results in image captioning. We begin with a sampling rate of 0 and increase the sampling rate every epoch by 0.05, with a maximum sampling rate of 0.25.