reproducibilityindex.ai

Training Recurrent Neural Networks via Forward Propagation Through Time

Authors: Anil Kag, Venkatesh Saligrama

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically FPTT outperforms BPTT on a number of well-known benchmark tasks, thus enabling architectures like LSTMs to solve long range dependencies problems. ... We then conduct a number of experiments on benchmark datasets and show that our proposed method is particularly effective on tasks that exhibit long-range dependencies.
Researcher Affiliation	Academia	Anil Kag 1 Venkatesh Saligrama 1 1Department of Electrical and Computer Engineering, Boston University, USA. Correspondence to: Anil Kag <anilkag@bu.edu>.
Pseudocode	Yes	Algorithm 1 Training RNN with Back Prop" and "Algorithm 2 Training RNN with FPTT.
Open Source Code	Yes	We have released our implementation at https://github.com/anilkagak2/FPTT
Open Datasets	Yes	The benchmark datasets used in this study are publicly available along with a train and test split. ... We perform experiments on three variants of the sequence-to-sequence benchmark Penn Tree Bank (PTB) dataset (Mc Auley & Leskovec, 2013). ... Pixel & Permute MNIST, CIFAR-10 are sequential variants of the popular image classiﬁcation datasets: MNIST (Lecun et al., 1998) and CIFAR-10 (Krizhevsky & Hinton, 2009).
Dataset Splits	Yes	For hyper-parameter tuning, we set aside a validation set on tasks where a validation set is not available. ... The benchmark datasets used in this study are publicly available along with a train and test split.
Hardware Specification	Yes	We perform our experiments on single GTX 1080 Ti GPU.
Software Dependencies	No	The paper states 'We implement FPTT in Pytorch using the pseudo code given by Algorithm 2,' but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup	Yes	For fair comparison, following previous works(Zhang et al., 2018; Kusupati et al., 2018; Kag et al., 2020), we use LSTMs with 128 dimensional hidden state and Adam as the choice of optimizer with initial learning rate 1e 3 for both algorithms. ... For the Add Task... a train batch size of 128 is presented to the RNN to update its parameters and evaluated using an independently drawn test set.