reproducibilityindex.ai

Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks

Authors: Víctor Campos, Brendan Jou, Xavier Giró-i-Nieto, Jordi Torres, Shih-Fu Chang

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the proposed model on various tasks and show how it can reduce the number of required RNN updates while preserving, and sometimes even improving, the performance of the baseline RNN models.
Researcher Affiliation	Collaboration	V ıctor Campos , Brendan Jou , Xavier Gir o-i-Nieto , Jordi Torres , Shih-Fu ChangΓ Barcelona Supercomputing Center, Google Inc, Universitat Polit ecnica de Catalunya, ΓColumbia University {victor.campos, jordi.torres}@bsc.es, bjou@google.com, xavier.giro@upc.edu, shih.fu.chang@columbia.edu
Pseudocode	No	The paper contains mathematical equations and diagrams but no structured pseudocode or algorithm blocks.
Open Source Code	Yes	Source code is publicly available at https://imatge-upc. github.io/skiprnn-2017-telecombcn/.
Open Datasets	Yes	We revisit one of the original LSTM tasks (Hochreiter & Schmidhuber, 1997), where the network is given a sequence of (value, marker) tuples. The MNIST handwritten digits classiﬁcation benchmark (Le Cun et al., 1998)... Charades (Sigurdsson et al., 2016a) is a dataset containing 9,848 videos... UCF-101 (Soomro et al., 2012) is a dataset containing 13,320 trimmed videos...
Dataset Splits	Yes	We follow the standard data split and set aside 5,000 training samples for validation purposes. ... We set aside 15% of training data for validation purposes.
Hardware Specification	Yes	Experiments are implemented with Tensor Flow3 and run on a single NVIDIA K80 GPU.
Software Dependencies	No	The paper mentions 'Tensor Flow' but does not specify a version number for it or any other software dependency.
Experiment Setup	Yes	Training is performed with Adam (Kingma & Ba, 2014), learning rate of 10 4, β1 = 0.9, β2 = 0.999 and ϵ = 10 8 on batches of 256. Gradient clipping (Pascanu et al., 2013) with a threshold of 1 is applied to all trainable variables. Bias bp in Equation 4 is initialized to 1...