SBO-RNN: Reformulating Recurrent Neural Networks via Stochastic Bilevel Optimization
Authors: Ziming Zhang, Yun Yue, Guojun Wu, Yanhua Li, Haichong Zhang
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically we demonstrate our approach with superior performance on several benchmark datasets, with fewer parameters, less training data, and much faster convergence. Code is available at https://zhang-vislab.github.io. |
| Researcher Affiliation | Academia | Ziming Zhang, Yun Yue, Guojun Wu, Yanhua Li, Haichong Zhang Worcester Polytechnic Institute Worcester, MA 01609 {zzhang15, yyue, gwu, yli15, hzhang10}@wpi.edu |
| Pseudocode | Yes | Algorithm 1 Training algorithm for SBO-RNN |
| Open Source Code | No | The paper states |
| Open Datasets | Yes | Datasets. We evaluate our method on longsequence benchmarks with varying difficulties, and list the statistics of these datasets in Table 1. ... Pixel-MNIST refers to pixel-by-pixel sequences of images in MNIST [Le Cun and Cortes, 2010] ... HAR-2 [Kusupati et al., 2018a] ... Penn Treebank (PTB) dataset [Melis et al., 2017] |
| Dataset Splits | Yes | We replicate the same benchmark training/testing split with 20% of training data for validation to tune hyperparameters. |
| Hardware Specification | Yes | All the experiments were run on an Nvidia Ge Force RTX 2080 Ti GPU server. |
| Software Dependencies | No | The paper mentions the use of |
| Experiment Setup | Yes | Hyperparameters. We use the grid search to fine-tune the hyperparameters of each baseline as well as ours on the validation datasets whenever is necessary. ... We further set η = 10 3 in Eq. 7, 8 and 9 in all the experiments as we observe that this number works well and consistently that leads the lower-level optimization to converge. The batch size of 64 is used across all the datasets for all the methods. Adam [Kingma and Ba, 2014] is used as the optimizer for all the methods. The learning rate for training SBO-RNN architectures is always initialized to 10 3 with linear scheduling of weight decay. |