reproducibilityindex.ai

Achieving $\mathcal{O}(\epsilon^{-1.5})$ Complexity in Hessian/Jacobian-free Stochastic Bilevel Optimization

Authors: Yifan Yang, Peiyao Xiao, Kaiyi Ji

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Theoretically, we show that Fde HBO requires O(ϵ 1.5) iterations (each using O(1) samples and only first-order gradient information) to find an ϵ-accurate stationary point. As far as we know, this is the first Hessian/Jacobian-free method with an O(ϵ 1.5) sample complexity for nonconvex-strongly-convex stochastic bilevel optimization. In this section, we test the performance of the proposed Fde HBO and FMBO on two applications: hyper-representation and data hyper-cleaning, respectively. As shown in Figure 1, our Fde HBO converges much faster and more stably than PZOBO-S, F2SA and F3SA, while achieving a higher training accuracy.
Researcher Affiliation	Academia	Yifan Yang, Peiyao Xiao and Kaiyi Ji Department of Computer Science and Engineering University at Buffalo Buffalo, NY 14260 {yyang99, peiyaoxi, kaiyiji}@buffalo.edu
Pseudocode	Yes	Algorithm 1 Hessian/Jacobian-free Bilevel Optimizer via Projection-aided Finite-difference Estimation; Algorithm 2 Fully Single-loop Momentum-based Bilevel Optimizer (FMBO)
Open Source Code	No	The paper does not contain any explicit statement about making the source code available or provide a link to a code repository for the implemented methods.
Open Datasets	Yes	4.1 Hyper-representation on MNIST Dataset; 4.2 Hyper-cleaning on MNIST Dataset
Dataset Splits	Yes	Sν and Sτ denote the training data and validation data, whose sizes are set to 20000 and 5000, respectively
Hardware Specification	No	The paper does not provide any specific details about the hardware used for experiments (e.g., GPU/CPU models, memory).
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., programming languages, libraries, frameworks).
Experiment Setup	Yes	We perform the hyper-representation with the 7-layer Le Net network [38]... whose sizes are set to 20000 and 5000, respectively, λ = {λi}i Sτ and C are the regularization parameters... More details of the experimental setups are specified in Appendix A.1.