The Statistical Recurrent Unit
Authors: Junier B. Oliva, Barnabás Póczos, Jeff Schneider
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show the efficacy of SRUs as compared to LSTMs and GRUs in an unbiased manner by optimizing respective architectures hyperparameters for both synthetic and real-world tasks. |
| Researcher Affiliation | Academia | 1Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA, USA. Correspondence to: Junier B. Oliva <joliva@cs.cmu.edu>. |
| Pseudocode | No | The paper provides update equations and a graphical representation (Figure 1) but does not include structured pseudocode or an algorithm block. |
| Open Source Code | Yes | See https://github.com/junieroliva/ recurrent for code. |
| Open Datasets | Yes | Next we explore the ability of recurrent units to use long-term dependencies in ones data with a synthetic task using a real dataset. It has been observed that LSTMs perform poorly in classifying a long pixel-by-pixel sequence of MNIST digits (Le et al., 2015). |
| Dataset Splits | Yes | We generate a total of 176 points per sequence for 3200 training sequences, 400 validation sequences, and 400 testing sequences. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running experiments. |
| Software Dependencies | No | All experiments were performed in Tensorflow (Abadi et al., 2016) and used the standard implementations of GRUCell and Basic LSTMCell for GRUs and LSTMs respectively. |
| Experiment Setup | Yes | In all experiments we used SGD for optimization using gradient clipping (Pascanu et al., 2013) with a norm of 1 on all algorithms. Unless otherwise specified 100 trials were performed to search over the following hyper-parameters on a validation set: one, initial learning rate the initial learning rate used for SGD, in range of [exp( 10), 1]; two, lr decay the multiplier to multiply the learning rate by every 1k iterations, in range of [0.8, 0.999]; three, dropout keep rate, percent of output units that are kept during dropout, in range (0, 1]; four, num units number of units for recurrent unit, in {1, . . . , 256}. |