reproducibilityindex.ai

Improved Penalty Method via Doubly Stochastic Gradients for Bilevel Hyperparameter Optimization

Authors: Wanli Shi, Bin Gu9621-9629

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We compare our method with three state-of-the-art gradient-based methods in three tasks, i.e., data denoising, few-shot learning, and training data poisoning, using several large-scale benchmark datasets. All the results demonstrate that our method outperforms or is comparable to the existing methods in terms of accuracy and efﬁciency.
Researcher Affiliation	Collaboration	Wanli Shi1, Bin Gu1,2,3* 1 Nanjing University of Information Science & Technology, P.R.China 2MBZUAI, United Arab Emirates 3JD Finance America Corporation, Mountain View, CA, USA
Pseudocode	Yes	Algorithm 1 DSGPHO
Open Source Code	No	The paper does not provide any statement about releasing source code or a link to a code repository.
Open Datasets	Yes	In this task, we conduct the experiments on the dataset MNIST, SVHN, CIFAR10. For each dataset, we split it into three subsets, i.e., training set, test set, and validation set and we introduce 25% noise into the training set. ... In this task, we use the Mini-Image Net (Vinyals et al. 2016) and Omniglot (Lake, Salakhutdinov, and Tenenbaum 2015) datasets.
Dataset Splits	Yes	For each dataset, we split it into three subsets, i.e., training set, test set, and validation set and we introduce 25% noise into the training set. The validation set of each dataset contains 1000, 10000, and 1000 points, respectively. ... For Mini-Imagenet, we split 64 classes in meta-training and 16 classes in meta-validation and 20 classes in meta-testing. ... We split 1000 instances into training set, 1000 into validation set and 8000 into testing set.
Hardware Specification	No	The paper does not provide any specific hardware details used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers.
Experiment Setup	Yes	For each hyperparameters update step, we update the model parameters for 50 times. For our method, we use 500 validation instances and 2048 constraints to calculate the doubly stochastic gradient. We ﬁx the step size of hyperparameters at 0.01 and tune the step size of model parameters in 0.001, 0.0001, 0.00001 for all methods. ... We ﬁx the step size of u at 0.1 and search v step size from {0.01, 0.001, 0.0001}. We randomly sample 2048 constraints in each iteration. ... We randomly sample 2048 constraints. We ﬁx the learning rate of u at 0.1 and choose learning rate of v form {0.01, 0.001, 0.0001}. We set the total number of update u at 5000.