The Statistical Recurrent Unit

Authors: Junier B. Oliva, Barnabás Póczos, Jeff Schneider

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show the efficacy of SRUs as compared to LSTMs and GRUs in an unbiased manner by optimizing respective architectures hyperparameters for both synthetic and real-world tasks.
Researcher Affiliation Academia 1Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA, USA. Correspondence to: Junier B. Oliva <joliva@cs.cmu.edu>.
Pseudocode No The paper provides update equations and a graphical representation (Figure 1) but does not include structured pseudocode or an algorithm block.
Open Source Code Yes See https://github.com/junieroliva/ recurrent for code.
Open Datasets Yes Next we explore the ability of recurrent units to use long-term dependencies in ones data with a synthetic task using a real dataset. It has been observed that LSTMs perform poorly in classifying a long pixel-by-pixel sequence of MNIST digits (Le et al., 2015).
Dataset Splits Yes We generate a total of 176 points per sequence for 3200 training sequences, 400 validation sequences, and 400 testing sequences.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running experiments.
Software Dependencies No All experiments were performed in Tensorflow (Abadi et al., 2016) and used the standard implementations of GRUCell and Basic LSTMCell for GRUs and LSTMs respectively.
Experiment Setup Yes In all experiments we used SGD for optimization using gradient clipping (Pascanu et al., 2013) with a norm of 1 on all algorithms. Unless otherwise specified 100 trials were performed to search over the following hyper-parameters on a validation set: one, initial learning rate the initial learning rate used for SGD, in range of [exp( 10), 1]; two, lr decay the multiplier to multiply the learning rate by every 1k iterations, in range of [0.8, 0.999]; three, dropout keep rate, percent of output units that are kept during dropout, in range (0, 1]; four, num units number of units for recurrent unit, in {1, . . . , 256}.