Learning Long Term Dependencies via Fourier Recurrent Units

Authors: Jiong Zhang, Yibo Lin, Zhao Song, Inderjit Dhillon

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We implemented the Fourier recurrent unit in Tensorflow (Abadi et al., 2016) and used the standard implementation of Basic RNNCell and Basic LSTMCell for RNN and LSTM, respectively. We also used the released source code of SRU (Oliva et al., 2017) and used the default configurations of {αi}5 i=1 = {0.0, 0.25, 0.5, 0.9, 0.99}, gt dimension of 60, and h(t) dimension of 200. We release our codes on github1. For fair comparison, we construct one layer of above cells with 200 units in the experiments. Adam (Kingma & Ba, 2014) is adopted as the optimization engine. We explore learning rates in {0.001, 0.005, 0.01, 0.05, 0.1} and learning rate decay in {0.8, 0.85, 0.9, 0.95, 0.99}. The best results are reported after grid search for best hyper parameters. For simplicity, we use FRUk,d to denote k sampled sparse frequencies and d dimensions for each frequency fk in a FRU cell.
Researcher Affiliation Collaboration 1UT-Austin, zhangjiong724@utexas.edu 2UT-Austin, yibolin@utexas.edu 3Harvard & UT-Austin, zhaos@seas.harvard.edu 4UT-Austin & Amazon, inderjit@cs.utexas.edu
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes We release our codes on github1.
Open Datasets Yes We tested FRU together with RNN, LSTM and SRU on both synthetic and real world datasets like pixel(permuted) MNIST, IMDB movie rating dataset.
Dataset Splits No Among the sequences, 80% are used for training and 20% are used for testing. We further evaluate FRU and other models with IMDB movie review dataset (25K training and 25K testing sequences). This describes training and testing splits, but does not explicitly mention a validation split needed for hyperparameter tuning.
Hardware Specification No The paper does not provide specific hardware details (like GPU/CPU models or memory) used for running its experiments.
Software Dependencies No The paper mentions using Tensorflow and TFLearn but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes We explore learning rates in {0.001, 0.005, 0.01, 0.05, 0.1} and learning rate decay in {0.8, 0.85, 0.9, 0.95, 0.99}. The best results are reported after grid search for best hyper parameters. Batch size is set to 256 and dropout (Srivastava et al., 2014) is not included in this experiment. All models use a single layer with 128 units, batch size of 32, dropout keep rate of 80%.