Achieving $\mathcal{O}(\epsilon^{-1.5})$ Complexity in Hessian/Jacobian-free Stochastic Bilevel Optimization

Authors: Yifan Yang, Peiyao Xiao, Kaiyi Ji

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Theoretically, we show that Fde HBO requires O(ϵ 1.5) iterations (each using O(1) samples and only first-order gradient information) to find an ϵ-accurate stationary point. As far as we know, this is the first Hessian/Jacobian-free method with an O(ϵ 1.5) sample complexity for nonconvex-strongly-convex stochastic bilevel optimization. In this section, we test the performance of the proposed Fde HBO and FMBO on two applications: hyper-representation and data hyper-cleaning, respectively. As shown in Figure 1, our Fde HBO converges much faster and more stably than PZOBO-S, F2SA and F3SA, while achieving a higher training accuracy.
Researcher Affiliation Academia Yifan Yang, Peiyao Xiao and Kaiyi Ji Department of Computer Science and Engineering University at Buffalo Buffalo, NY 14260 {yyang99, peiyaoxi, kaiyiji}@buffalo.edu
Pseudocode Yes Algorithm 1 Hessian/Jacobian-free Bilevel Optimizer via Projection-aided Finite-difference Estimation; Algorithm 2 Fully Single-loop Momentum-based Bilevel Optimizer (FMBO)
Open Source Code No The paper does not contain any explicit statement about making the source code available or provide a link to a code repository for the implemented methods.
Open Datasets Yes 4.1 Hyper-representation on MNIST Dataset; 4.2 Hyper-cleaning on MNIST Dataset
Dataset Splits Yes Sν and Sτ denote the training data and validation data, whose sizes are set to 20000 and 5000, respectively
Hardware Specification No The paper does not provide any specific details about the hardware used for experiments (e.g., GPU/CPU models, memory).
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., programming languages, libraries, frameworks).
Experiment Setup Yes We perform the hyper-representation with the 7-layer Le Net network [38]... whose sizes are set to 20000 and 5000, respectively, λ = {λi}i Sτ and C are the regularization parameters... More details of the experimental setups are specified in Appendix A.1.