Improved Penalty Method via Doubly Stochastic Gradients for Bilevel Hyperparameter Optimization
Authors: Wanli Shi, Bin Gu9621-9629
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare our method with three state-of-the-art gradient-based methods in three tasks, i.e., data denoising, few-shot learning, and training data poisoning, using several large-scale benchmark datasets. All the results demonstrate that our method outperforms or is comparable to the existing methods in terms of accuracy and efficiency. |
| Researcher Affiliation | Collaboration | Wanli Shi1, Bin Gu1,2,3* 1 Nanjing University of Information Science & Technology, P.R.China 2MBZUAI, United Arab Emirates 3JD Finance America Corporation, Mountain View, CA, USA |
| Pseudocode | Yes | Algorithm 1 DSGPHO |
| Open Source Code | No | The paper does not provide any statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | In this task, we conduct the experiments on the dataset MNIST, SVHN, CIFAR10. For each dataset, we split it into three subsets, i.e., training set, test set, and validation set and we introduce 25% noise into the training set. ... In this task, we use the Mini-Image Net (Vinyals et al. 2016) and Omniglot (Lake, Salakhutdinov, and Tenenbaum 2015) datasets. |
| Dataset Splits | Yes | For each dataset, we split it into three subsets, i.e., training set, test set, and validation set and we introduce 25% noise into the training set. The validation set of each dataset contains 1000, 10000, and 1000 points, respectively. ... For Mini-Imagenet, we split 64 classes in meta-training and 16 classes in meta-validation and 20 classes in meta-testing. ... We split 1000 instances into training set, 1000 into validation set and 8000 into testing set. |
| Hardware Specification | No | The paper does not provide any specific hardware details used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | For each hyperparameters update step, we update the model parameters for 50 times. For our method, we use 500 validation instances and 2048 constraints to calculate the doubly stochastic gradient. We fix the step size of hyperparameters at 0.01 and tune the step size of model parameters in 0.001, 0.0001, 0.00001 for all methods. ... We fix the step size of u at 0.1 and search v step size from {0.01, 0.001, 0.0001}. We randomly sample 2048 constraints in each iteration. ... We randomly sample 2048 constraints. We fix the learning rate of u at 0.1 and choose learning rate of v form {0.01, 0.001, 0.0001}. We set the total number of update u at 5000. |