Training Recurrent Neural Networks via Forward Propagation Through Time
Authors: Anil Kag, Venkatesh Saligrama
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically FPTT outperforms BPTT on a number of well-known benchmark tasks, thus enabling architectures like LSTMs to solve long range dependencies problems. ... We then conduct a number of experiments on benchmark datasets and show that our proposed method is particularly effective on tasks that exhibit long-range dependencies. |
| Researcher Affiliation | Academia | Anil Kag 1 Venkatesh Saligrama 1 1Department of Electrical and Computer Engineering, Boston University, USA. Correspondence to: Anil Kag <anilkag@bu.edu>. |
| Pseudocode | Yes | Algorithm 1 Training RNN with Back Prop" and "Algorithm 2 Training RNN with FPTT. |
| Open Source Code | Yes | We have released our implementation at https://github.com/anilkagak2/FPTT |
| Open Datasets | Yes | The benchmark datasets used in this study are publicly available along with a train and test split. ... We perform experiments on three variants of the sequence-to-sequence benchmark Penn Tree Bank (PTB) dataset (Mc Auley & Leskovec, 2013). ... Pixel & Permute MNIST, CIFAR-10 are sequential variants of the popular image classification datasets: MNIST (Lecun et al., 1998) and CIFAR-10 (Krizhevsky & Hinton, 2009). |
| Dataset Splits | Yes | For hyper-parameter tuning, we set aside a validation set on tasks where a validation set is not available. ... The benchmark datasets used in this study are publicly available along with a train and test split. |
| Hardware Specification | Yes | We perform our experiments on single GTX 1080 Ti GPU. |
| Software Dependencies | No | The paper states 'We implement FPTT in Pytorch using the pseudo code given by Algorithm 2,' but does not provide specific version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | For fair comparison, following previous works(Zhang et al., 2018; Kusupati et al., 2018; Kag et al., 2020), we use LSTMs with 128 dimensional hidden state and Adam as the choice of optimizer with initial learning rate 1e 3 for both algorithms. ... For the Add Task... a train batch size of 128 is presented to the RNN to update its parameters and evaluated using an independently drawn test set. |