reproducibilityindex.ai

Enhanced Bilevel Optimization via Bregman Distance

Authors: Feihu Huang, Junyi Li, Shangqian Gao, Heng Huang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct data hyper-cleaning task and hyper-representation learning task to demonstrate that our new algorithms outperform related bilevel optimization approaches.
Researcher Affiliation	Academia	1Electrical & Computer Engineering, University of Pittsburgh, Pittsburgh, PA, United States 2College of Computer Science & Technology, Nanjing University of Aeronautics & Astronautics, Nanjing, China
Pseudocode	Yes	Algorithm 1 Deterministic Bi O-Bre D Algorithm; Algorithm 2 Stochastic Bi O-Bre D (SBi O-Bre D) Algorithm; Algorithm 3 Accelerated Stochastic Bi O-Bre D (ASBi O-Bre D) Algorithm
Open Source Code	No	The paper does not provide any explicit statements about open-sourcing the code or links to a code repository.
Open Datasets	Yes	We conduct data hyper-cleaning task [39] over the MNIST dataset [25]; 2) hyper-representation learning task [9] over the Omniglot dataset [24]. ... The dataset includes a training set and a validation set where each contains 5000 images.
Dataset Splits	Yes	The dataset includes a training set and a validation set where each contains 5000 images.
Hardware Specification	Yes	All experiments are averaged over 5 runs and we use a server with AMD EPYC 7763 64-Core CPU and 1 NVIDIA RTX A5000.
Software Dependencies	No	The paper does not specify any software names with version numbers.
Experiment Setup	Yes	In the experiment, we compare our algorithms (i.e., Bi O-Bre D, SBi O-Bre D, and ASBi O-Bre D) with the following bilevel optimization algorithms: reverse [9]/AID-Bi O [11, 22], AID-CG [12], AID-FP [12], stoc Bi O [22]), MRBO [21], VRBO [21], FSLA [28], SUSTAIN [23], and VR-sa Bi Adam [18]. All experiments are averaged over 5 runs and we use a server with AMD EPYC 7763 64-Core CPU and 1 NVIDIA RTX A5000. ... The detailed experimental setup is described in the Appendix A.1. For hyper-parameters, we perform grid search for our algorithms and other baselines to choose the best setting. ... We use Bregman function ψt(x) = 1/2xT Htx to generate the Bregman distance in our algorithms, where Ht is the adaptive matrix as used in [19], i.e. the exponential moving average of the square of the gradient and we use coefficient 0.99 in all experiments.