SPARKLE: A Unified Single-Loop Primal-Dual Framework for Decentralized Bilevel Optimization

Authors: Shuchen Zhu, Boao Kong, Songtao Lu, Xinmeng Huang, Kun Yuan

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we present experiments to validate our theoretical findings. We first explore how update strategies and network structures influence the convergence of SPARKLE. Then we compare SPARKLE to the existing decentralized SBO algorithms. Additional experiments about a decentralized SBO problem with synthetic data are in Appendix D.1.
Researcher Affiliation Collaboration Shuchen Zhu Peking University shuchenzhu@stu.pku.edu.cn Boao Kong Peking University kongboao@stu.pku.edu.cn Songtao Lu IBM Research songtao@ibm.com Xinmeng Huang University of Pennsylvania xinmengh@sas.upenn.edu Kun Yuan Peking University kunyuan@pku.edu.cn
Pseudocode Yes Algorithm 1 SPARKLE: A unified framework for decentralized stochastic bilevel optimization
Open Source Code No The paper does not provide an explicit statement or link to open-source code for the described methodology.
Open Datasets Yes The Fashion MNIST dataset consists of 60000 images for training and 10000 images for testing and we randomly split 50000 training images into a training set and the other 10000 images into a validation set.
Dataset Splits Yes The Fashion MNIST dataset consists of 60000 images for training and 10000 images for testing and we randomly split 50000 training images into a training set and the other 10000 images into a validation set.
Hardware Specification Yes All experiments described in this section were run on an NVIDIA A100 server.
Software Dependencies No The paper mentions general software environments or tools (e.g., 'two-layer MLP network', 'four-layer CNN') but does not specify particular software packages with version numbers (e.g., Python 3.x, TensorFlow 2.x, PyTorch 1.x).
Experiment Setup Yes The batch size is set to 50. The step-sizes for all the algorithms are set to αk = βk = γk = 0.03 and the term η in MDBO is set to 0.5. The moving-average term θk = 0.2.