Enhanced Bilevel Optimization via Bregman Distance
Authors: Feihu Huang, Junyi Li, Shangqian Gao, Heng Huang
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct data hyper-cleaning task and hyper-representation learning task to demonstrate that our new algorithms outperform related bilevel optimization approaches. |
| Researcher Affiliation | Academia | 1Electrical & Computer Engineering, University of Pittsburgh, Pittsburgh, PA, United States 2College of Computer Science & Technology, Nanjing University of Aeronautics & Astronautics, Nanjing, China |
| Pseudocode | Yes | Algorithm 1 Deterministic Bi O-Bre D Algorithm; Algorithm 2 Stochastic Bi O-Bre D (SBi O-Bre D) Algorithm; Algorithm 3 Accelerated Stochastic Bi O-Bre D (ASBi O-Bre D) Algorithm |
| Open Source Code | No | The paper does not provide any explicit statements about open-sourcing the code or links to a code repository. |
| Open Datasets | Yes | We conduct data hyper-cleaning task [39] over the MNIST dataset [25]; 2) hyper-representation learning task [9] over the Omniglot dataset [24]. ... The dataset includes a training set and a validation set where each contains 5000 images. |
| Dataset Splits | Yes | The dataset includes a training set and a validation set where each contains 5000 images. |
| Hardware Specification | Yes | All experiments are averaged over 5 runs and we use a server with AMD EPYC 7763 64-Core CPU and 1 NVIDIA RTX A5000. |
| Software Dependencies | No | The paper does not specify any software names with version numbers. |
| Experiment Setup | Yes | In the experiment, we compare our algorithms (i.e., Bi O-Bre D, SBi O-Bre D, and ASBi O-Bre D) with the following bilevel optimization algorithms: reverse [9]/AID-Bi O [11, 22], AID-CG [12], AID-FP [12], stoc Bi O [22]), MRBO [21], VRBO [21], FSLA [28], SUSTAIN [23], and VR-sa Bi Adam [18]. All experiments are averaged over 5 runs and we use a server with AMD EPYC 7763 64-Core CPU and 1 NVIDIA RTX A5000. ... The detailed experimental setup is described in the Appendix A.1. For hyper-parameters, we perform grid search for our algorithms and other baselines to choose the best setting. ... We use Bregman function ψt(x) = 1/2xT Htx to generate the Bregman distance in our algorithms, where Ht is the adaptive matrix as used in [19], i.e. the exponential moving average of the square of the gradient and we use coefficient 0.99 in all experiments. |