Middle-Out Decoding
Authors: Shikib Mehri, Leonid Sigal
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the aforementioned models on two sequence generation tasks. First, we evaluate middleout decoders on the synthetic problem of de-noising a symmetric sequence. Next, we explore the problem of video captioning on the MSVD dataset (Chen and Dolan, 2011), evaluating our models for quality, diversity, and control. |
| Researcher Affiliation | Academia | Shikib Mehri Department of Computer Science University of British Columbia amehri@cs.cmu.edu Leonid Sigal Department of Computer Science University of British Columbia lsigal@cs.ubc.ca |
| Pseudocode | No | The paper describes the model architecture and training procedures in text and with diagrams, but it does not include formal pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing the source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | We utilize frame-level features provided by Pasunuru and Bansal (2017). The videos were sampled at 3fps and passed through an Inception-v4 model (Szegedy et al., 2017), pretrained on Image Net (Deng et al., 2009), to obtain 1536-dim feature vector for each frame. For this task, we utilize the MSVD (Youtube2Text) dataset (Chen and Dolan, 2011) |
| Dataset Splits | Yes | We use the standard splits provided by Venugopalan et al. (2015a) with 1200 training videos, 100 for validation and 670 for testing. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used to run its experiments. |
| Software Dependencies | No | The paper mentions software components like "LSTMs", "Adam optimizer", "word2vec", and "Inception-v4 model" but does not specify their version numbers, which is required for reproducible software dependencies. |
| Experiment Setup | Yes | We utilize 100-dimensional LSTMs and the Adam optimizer (Kingma and Ba, 2015) with a learning rate of 1e 4. We train the models for 20, 000 steps with a batch size of 32. For all of our models, we use a 1024-dimensional LSTMs, 512-dimensional embeddings... and the Adam optimizer with a learning rate of 1e 4. We utilize a batch size of 32 and train for 15 epochs. We employ a scheduled sampling training strategy (Bengio et al., 2015), which has greatly improved results in image captioning. We begin with a sampling rate of 0 and increase the sampling rate every epoch by 0.05, with a maximum sampling rate of 0.25. |