Achieving $\mathcal{O}(\epsilon^{-1.5})$ Complexity in Hessian/Jacobian-free Stochastic Bilevel Optimization
Authors: Yifan Yang, Peiyao Xiao, Kaiyi Ji
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Theoretically, we show that Fde HBO requires O(ϵ 1.5) iterations (each using O(1) samples and only first-order gradient information) to find an ϵ-accurate stationary point. As far as we know, this is the first Hessian/Jacobian-free method with an O(ϵ 1.5) sample complexity for nonconvex-strongly-convex stochastic bilevel optimization. In this section, we test the performance of the proposed Fde HBO and FMBO on two applications: hyper-representation and data hyper-cleaning, respectively. As shown in Figure 1, our Fde HBO converges much faster and more stably than PZOBO-S, F2SA and F3SA, while achieving a higher training accuracy. |
| Researcher Affiliation | Academia | Yifan Yang, Peiyao Xiao and Kaiyi Ji Department of Computer Science and Engineering University at Buffalo Buffalo, NY 14260 {yyang99, peiyaoxi, kaiyiji}@buffalo.edu |
| Pseudocode | Yes | Algorithm 1 Hessian/Jacobian-free Bilevel Optimizer via Projection-aided Finite-difference Estimation; Algorithm 2 Fully Single-loop Momentum-based Bilevel Optimizer (FMBO) |
| Open Source Code | No | The paper does not contain any explicit statement about making the source code available or provide a link to a code repository for the implemented methods. |
| Open Datasets | Yes | 4.1 Hyper-representation on MNIST Dataset; 4.2 Hyper-cleaning on MNIST Dataset |
| Dataset Splits | Yes | Sν and Sτ denote the training data and validation data, whose sizes are set to 20000 and 5000, respectively |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for experiments (e.g., GPU/CPU models, memory). |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., programming languages, libraries, frameworks). |
| Experiment Setup | Yes | We perform the hyper-representation with the 7-layer Le Net network [38]... whose sizes are set to 20000 and 5000, respectively, λ = {λi}i Sτ and C are the regularization parameters... More details of the experimental setups are specified in Appendix A.1. |