Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks

Authors: Víctor Campos, Brendan Jou, Xavier Giró-i-Nieto, Jordi Torres, Shih-Fu Chang

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the proposed model on various tasks and show how it can reduce the number of required RNN updates while preserving, and sometimes even improving, the performance of the baseline RNN models.
Researcher Affiliation Collaboration V ıctor Campos , Brendan Jou , Xavier Gir o-i-Nieto , Jordi Torres , Shih-Fu ChangΓ Barcelona Supercomputing Center, Google Inc, Universitat Polit ecnica de Catalunya, ΓColumbia University {victor.campos, jordi.torres}@bsc.es, bjou@google.com, xavier.giro@upc.edu, shih.fu.chang@columbia.edu
Pseudocode No The paper contains mathematical equations and diagrams but no structured pseudocode or algorithm blocks.
Open Source Code Yes Source code is publicly available at https://imatge-upc. github.io/skiprnn-2017-telecombcn/.
Open Datasets Yes We revisit one of the original LSTM tasks (Hochreiter & Schmidhuber, 1997), where the network is given a sequence of (value, marker) tuples. The MNIST handwritten digits classification benchmark (Le Cun et al., 1998)... Charades (Sigurdsson et al., 2016a) is a dataset containing 9,848 videos... UCF-101 (Soomro et al., 2012) is a dataset containing 13,320 trimmed videos...
Dataset Splits Yes We follow the standard data split and set aside 5,000 training samples for validation purposes. ... We set aside 15% of training data for validation purposes.
Hardware Specification Yes Experiments are implemented with Tensor Flow3 and run on a single NVIDIA K80 GPU.
Software Dependencies No The paper mentions 'Tensor Flow' but does not specify a version number for it or any other software dependency.
Experiment Setup Yes Training is performed with Adam (Kingma & Ba, 2014), learning rate of 10 4, β1 = 0.9, β2 = 0.999 and ϵ = 10 8 on batches of 256. Gradient clipping (Pascanu et al., 2013) with a threshold of 1 is applied to all trainable variables. Bias bp in Equation 4 is initialized to 1...