Decentralized Stochastic Bilevel Optimization with Improved per-Iteration Complexity

Authors: Xuxing Chen, Minhui Huang, Shiqian Ma, Krishna Balasubramanian

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on several machine learning problems. Our numerical results show the efficiency of our algorithm in both the synthetic and the real-world problems.
Researcher Affiliation Academia 1Department of Mathematics, University of California, Davis, USA 2Department of Electrical and Computer Engineering, University of California, Davis, USA 3Department of Computational Applied Mathematics and Operations Research, Rice University, Houston, USA 4Department of Statistics, University of California, Davis, USA.
Pseudocode Yes Algorithm 1 Hessian-Inverse-Gradient Product oracle... Algorithm 2 Hypergradient Estimation... Algorithm 3 MA-DSBO Algorithm
Open Source Code No The paper does not provide a direct link to open-source code for the methodology described, nor an explicit statement about its release in supplementary material or an external repository.
Open Datasets Yes Now we consider hyperparameter optimization on MNIST dataset (Le Cun et al., 1998).
Dataset Splits No The paper mentions 'training and validation set' but does not specify the splits (percentages or counts) used for these sets, or how they were derived.
Hardware Specification No The paper mentions 'All the experiments are performed on a local device with 8 cores (n = 8)', but does not specify the CPU model, GPU model, or other detailed hardware specifications.
Software Dependencies Yes All the experiments are performed on a local device with 8 cores (n = 8) using mpi4py (Dalcin & Fang, 2021) for parallel computing and Py Torch (Paszke et al., 2019) for computing stochastic oracles.
Experiment Setup Yes We include the numerical results of different stepsize choices in Figure 2. Note that in previous algorithms (Chen et al., 2022b; Yang et al., 2022) one Hessian matrix of the lower level function requires O(c2p2) storage, while in our algorithm a Hessian-vector product only requires O(cp) storage, which improves both the space and the communication complexity.