Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks
Authors: Víctor Campos, Brendan Jou, Xavier Giró-i-Nieto, Jordi Torres, Shih-Fu Chang
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the proposed model on various tasks and show how it can reduce the number of required RNN updates while preserving, and sometimes even improving, the performance of the baseline RNN models. |
| Researcher Affiliation | Collaboration | V ıctor Campos , Brendan Jou , Xavier Gir o-i-Nieto , Jordi Torres , Shih-Fu ChangΓ Barcelona Supercomputing Center, Google Inc, Universitat Polit ecnica de Catalunya, ΓColumbia University {victor.campos, jordi.torres}@bsc.es, bjou@google.com, xavier.giro@upc.edu, shih.fu.chang@columbia.edu |
| Pseudocode | No | The paper contains mathematical equations and diagrams but no structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Source code is publicly available at https://imatge-upc. github.io/skiprnn-2017-telecombcn/. |
| Open Datasets | Yes | We revisit one of the original LSTM tasks (Hochreiter & Schmidhuber, 1997), where the network is given a sequence of (value, marker) tuples. The MNIST handwritten digits classification benchmark (Le Cun et al., 1998)... Charades (Sigurdsson et al., 2016a) is a dataset containing 9,848 videos... UCF-101 (Soomro et al., 2012) is a dataset containing 13,320 trimmed videos... |
| Dataset Splits | Yes | We follow the standard data split and set aside 5,000 training samples for validation purposes. ... We set aside 15% of training data for validation purposes. |
| Hardware Specification | Yes | Experiments are implemented with Tensor Flow3 and run on a single NVIDIA K80 GPU. |
| Software Dependencies | No | The paper mentions 'Tensor Flow' but does not specify a version number for it or any other software dependency. |
| Experiment Setup | Yes | Training is performed with Adam (Kingma & Ba, 2014), learning rate of 10 4, β1 = 0.9, β2 = 0.999 and ϵ = 10 8 on batches of 256. Gradient clipping (Pascanu et al., 2013) with a threshold of 1 is applied to all trainable variables. Bias bp in Equation 4 is initialized to 1... |