Learning Long Term Dependencies via Fourier Recurrent Units
Authors: Jiong Zhang, Yibo Lin, Zhao Song, Inderjit Dhillon
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We implemented the Fourier recurrent unit in Tensorflow (Abadi et al., 2016) and used the standard implementation of Basic RNNCell and Basic LSTMCell for RNN and LSTM, respectively. We also used the released source code of SRU (Oliva et al., 2017) and used the default configurations of {αi}5 i=1 = {0.0, 0.25, 0.5, 0.9, 0.99}, gt dimension of 60, and h(t) dimension of 200. We release our codes on github1. For fair comparison, we construct one layer of above cells with 200 units in the experiments. Adam (Kingma & Ba, 2014) is adopted as the optimization engine. We explore learning rates in {0.001, 0.005, 0.01, 0.05, 0.1} and learning rate decay in {0.8, 0.85, 0.9, 0.95, 0.99}. The best results are reported after grid search for best hyper parameters. For simplicity, we use FRUk,d to denote k sampled sparse frequencies and d dimensions for each frequency fk in a FRU cell. |
| Researcher Affiliation | Collaboration | 1UT-Austin, zhangjiong724@utexas.edu 2UT-Austin, yibolin@utexas.edu 3Harvard & UT-Austin, zhaos@seas.harvard.edu 4UT-Austin & Amazon, inderjit@cs.utexas.edu |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We release our codes on github1. |
| Open Datasets | Yes | We tested FRU together with RNN, LSTM and SRU on both synthetic and real world datasets like pixel(permuted) MNIST, IMDB movie rating dataset. |
| Dataset Splits | No | Among the sequences, 80% are used for training and 20% are used for testing. We further evaluate FRU and other models with IMDB movie review dataset (25K training and 25K testing sequences). This describes training and testing splits, but does not explicitly mention a validation split needed for hyperparameter tuning. |
| Hardware Specification | No | The paper does not provide specific hardware details (like GPU/CPU models or memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions using Tensorflow and TFLearn but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | We explore learning rates in {0.001, 0.005, 0.01, 0.05, 0.1} and learning rate decay in {0.8, 0.85, 0.9, 0.95, 0.99}. The best results are reported after grid search for best hyper parameters. Batch size is set to 256 and dropout (Srivastava et al., 2014) is not included in this experiment. All models use a single layer with 128 units, batch size of 32, dropout keep rate of 80%. |