reproducibilityindex.ai

SBO-RNN: Reformulating Recurrent Neural Networks via Stochastic Bilevel Optimization

Authors: Ziming Zhang, Yun Yue, Guojun Wu, Yanhua Li, Haichong Zhang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically we demonstrate our approach with superior performance on several benchmark datasets, with fewer parameters, less training data, and much faster convergence. Code is available at https://zhang-vislab.github.io.
Researcher Affiliation	Academia	Ziming Zhang, Yun Yue, Guojun Wu, Yanhua Li, Haichong Zhang Worcester Polytechnic Institute Worcester, MA 01609 {zzhang15, yyue, gwu, yli15, hzhang10}@wpi.edu
Pseudocode	Yes	Algorithm 1 Training algorithm for SBO-RNN
Open Source Code	No	The paper states
Open Datasets	Yes	Datasets. We evaluate our method on longsequence benchmarks with varying difﬁculties, and list the statistics of these datasets in Table 1. ... Pixel-MNIST refers to pixel-by-pixel sequences of images in MNIST [Le Cun and Cortes, 2010] ... HAR-2 [Kusupati et al., 2018a] ... Penn Treebank (PTB) dataset [Melis et al., 2017]
Dataset Splits	Yes	We replicate the same benchmark training/testing split with 20% of training data for validation to tune hyperparameters.
Hardware Specification	Yes	All the experiments were run on an Nvidia Ge Force RTX 2080 Ti GPU server.
Software Dependencies	No	The paper mentions the use of
Experiment Setup	Yes	Hyperparameters. We use the grid search to ﬁne-tune the hyperparameters of each baseline as well as ours on the validation datasets whenever is necessary. ... We further set η = 10 3 in Eq. 7, 8 and 9 in all the experiments as we observe that this number works well and consistently that leads the lower-level optimization to converge. The batch size of 64 is used across all the datasets for all the methods. Adam [Kingma and Ba, 2014] is used as the optimizer for all the methods. The learning rate for training SBO-RNN architectures is always initialized to 10 3 with linear scheduling of weight decay.