Decentralized Stochastic Bilevel Optimization with Improved per-Iteration Complexity
Authors: Xuxing Chen, Minhui Huang, Shiqian Ma, Krishna Balasubramanian
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on several machine learning problems. Our numerical results show the efficiency of our algorithm in both the synthetic and the real-world problems. |
| Researcher Affiliation | Academia | 1Department of Mathematics, University of California, Davis, USA 2Department of Electrical and Computer Engineering, University of California, Davis, USA 3Department of Computational Applied Mathematics and Operations Research, Rice University, Houston, USA 4Department of Statistics, University of California, Davis, USA. |
| Pseudocode | Yes | Algorithm 1 Hessian-Inverse-Gradient Product oracle... Algorithm 2 Hypergradient Estimation... Algorithm 3 MA-DSBO Algorithm |
| Open Source Code | No | The paper does not provide a direct link to open-source code for the methodology described, nor an explicit statement about its release in supplementary material or an external repository. |
| Open Datasets | Yes | Now we consider hyperparameter optimization on MNIST dataset (Le Cun et al., 1998). |
| Dataset Splits | No | The paper mentions 'training and validation set' but does not specify the splits (percentages or counts) used for these sets, or how they were derived. |
| Hardware Specification | No | The paper mentions 'All the experiments are performed on a local device with 8 cores (n = 8)', but does not specify the CPU model, GPU model, or other detailed hardware specifications. |
| Software Dependencies | Yes | All the experiments are performed on a local device with 8 cores (n = 8) using mpi4py (Dalcin & Fang, 2021) for parallel computing and Py Torch (Paszke et al., 2019) for computing stochastic oracles. |
| Experiment Setup | Yes | We include the numerical results of different stepsize choices in Figure 2. Note that in previous algorithms (Chen et al., 2022b; Yang et al., 2022) one Hessian matrix of the lower level function requires O(c2p2) storage, while in our algorithm a Hessian-vector product only requires O(cp) storage, which improves both the space and the communication complexity. |