MomentumRNN: Integrating Momentum into Recurrent Neural Networks
Authors: Tan Nguyen, Richard Baraniuk, Andrea Bertozzi, Stanley Osher, Bao Wang
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we evaluate the effectiveness of our momentum approach in designing RNNs in terms of convergence speed and accuracy. We compare the performance of the Momentum LSTM with the baseline LSTM [24] in the following tasks: 1) the object classification task on pixel-permuted MNIST [32], 2) the speech prediction task on the TIMIT dataset [1, 22, 62, 38, 23], 3) the celebrated copying and adding tasks [24, 1], and 4) the language modeling task on the Penn Tree Bank (PTB) dataset [39]. |
| Researcher Affiliation | Academia | Tan M. Nguyen Department of ECE Rice University, Houston, USA, Richard G. Baraniuk Department of ECE Rice University, Houston, USA, Andrea L. Bertozzi Department of Mathematics University of California, Los Angeles, Stanley J. Osher Department of Mathematics University of California, Los Angeles, Bao Wang Department of Mathematics Scientific Computing and Imaging (SCI) Institute University of Utah, Salt Lake City, UT, USA |
| Pseudocode | No | The paper includes architectural illustrations and mathematical equations, but no structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions using baseline codebases provided by [5] and [54] for experiments, but it does not provide concrete access to its own open-source code for the methodology described in the paper. |
| Open Datasets | Yes | We compare the performance of the Momentum LSTM with the baseline LSTM [24] in the following tasks: 1) the object classification task on pixel-permuted MNIST [32], 2) the speech prediction task on the TIMIT dataset [1, 22, 62, 38, 23], 3) the celebrated copying and adding tasks [24, 1], and 4) the language modeling task on the Penn Tree Bank (PTB) dataset [39]. |
| Dataset Splits | Yes | We use the standard train/validation/test separation in [62, 34, 6], thereby having 3640 utterances for the training set with a validation set of size 192 and a test set of size 400. Results are reported on the test set using the model parameters that yield the best validation loss. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'Pytorch' from a cited reference [48] and refers to external codebases for baselines, but it does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment. |
| Experiment Setup | Yes | We include details on the models, datasets, training procedure, and hyperparameters used in our experiments in Appendix A. |