SBO-RNN: Reformulating Recurrent Neural Networks via Stochastic Bilevel Optimization

Authors: Ziming Zhang, Yun Yue, Guojun Wu, Yanhua Li, Haichong Zhang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically we demonstrate our approach with superior performance on several benchmark datasets, with fewer parameters, less training data, and much faster convergence. Code is available at https://zhang-vislab.github.io.
Researcher Affiliation Academia Ziming Zhang, Yun Yue, Guojun Wu, Yanhua Li, Haichong Zhang Worcester Polytechnic Institute Worcester, MA 01609 {zzhang15, yyue, gwu, yli15, hzhang10}@wpi.edu
Pseudocode Yes Algorithm 1 Training algorithm for SBO-RNN
Open Source Code No The paper states
Open Datasets Yes Datasets. We evaluate our method on longsequence benchmarks with varying difficulties, and list the statistics of these datasets in Table 1. ... Pixel-MNIST refers to pixel-by-pixel sequences of images in MNIST [Le Cun and Cortes, 2010] ... HAR-2 [Kusupati et al., 2018a] ... Penn Treebank (PTB) dataset [Melis et al., 2017]
Dataset Splits Yes We replicate the same benchmark training/testing split with 20% of training data for validation to tune hyperparameters.
Hardware Specification Yes All the experiments were run on an Nvidia Ge Force RTX 2080 Ti GPU server.
Software Dependencies No The paper mentions the use of
Experiment Setup Yes Hyperparameters. We use the grid search to fine-tune the hyperparameters of each baseline as well as ours on the validation datasets whenever is necessary. ... We further set η = 10 3 in Eq. 7, 8 and 9 in all the experiments as we observe that this number works well and consistently that leads the lower-level optimization to converge. The batch size of 64 is used across all the datasets for all the methods. Adam [Kingma and Ba, 2014] is used as the optimizer for all the methods. The learning rate for training SBO-RNN architectures is always initialized to 10 3 with linear scheduling of weight decay.