Training Recurrent Neural Networks via Forward Propagation Through Time

Authors: Anil Kag, Venkatesh Saligrama

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically FPTT outperforms BPTT on a number of well-known benchmark tasks, thus enabling architectures like LSTMs to solve long range dependencies problems. ... We then conduct a number of experiments on benchmark datasets and show that our proposed method is particularly effective on tasks that exhibit long-range dependencies.
Researcher Affiliation Academia Anil Kag 1 Venkatesh Saligrama 1 1Department of Electrical and Computer Engineering, Boston University, USA. Correspondence to: Anil Kag <anilkag@bu.edu>.
Pseudocode Yes Algorithm 1 Training RNN with Back Prop" and "Algorithm 2 Training RNN with FPTT.
Open Source Code Yes We have released our implementation at https://github.com/anilkagak2/FPTT
Open Datasets Yes The benchmark datasets used in this study are publicly available along with a train and test split. ... We perform experiments on three variants of the sequence-to-sequence benchmark Penn Tree Bank (PTB) dataset (Mc Auley & Leskovec, 2013). ... Pixel & Permute MNIST, CIFAR-10 are sequential variants of the popular image classification datasets: MNIST (Lecun et al., 1998) and CIFAR-10 (Krizhevsky & Hinton, 2009).
Dataset Splits Yes For hyper-parameter tuning, we set aside a validation set on tasks where a validation set is not available. ... The benchmark datasets used in this study are publicly available along with a train and test split.
Hardware Specification Yes We perform our experiments on single GTX 1080 Ti GPU.
Software Dependencies No The paper states 'We implement FPTT in Pytorch using the pseudo code given by Algorithm 2,' but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes For fair comparison, following previous works(Zhang et al., 2018; Kusupati et al., 2018; Kag et al., 2020), we use LSTMs with 128 dimensional hidden state and Adam as the choice of optimizer with initial learning rate 1e 3 for both algorithms. ... For the Add Task... a train batch size of 128 is presented to the RNN to update its parameters and evaluated using an independently drawn test set.