Blockwise Stochastic Variance-Reduced Methods with Parallel Speedup for Multi-Block Bilevel Optimization

Authors: Quanqi Hu, Zi-Hao Qiu, Zhishuai Guo, Lijun Zhang, Tianbao Yang

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also conduct experiments to verify the effectiveness of the proposed algorithms comparing with existing MBBO algorithms. We conduct experiments on both algorithms for lowdimensional and high-dimensional lower problems and demonstrate the effectiveness of our algorithms against existing algorithms of MBBO.
Researcher Affiliation Academia 1Department of Computer Science and Engineering, Texas A&M University, College Station, TX, USA 2National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China 3Most work of Z.H. Qiu was done when visiting the Opt MAI lab at TAMU.
Pseudocode Yes Algorithm 1 Blockwise Stochastic Variance-Reduced Bilevel Method (version 1): BSVRBv1 and Algorithm 2 Block-wise Stochastic Variance-Reduced Bilevel Method (version 2): BSVRBv2
Open Source Code Yes The code for reproducing the experimental results in this section is available at https://github.com/Optimization-AI/ICML2023_BSVRB.
Open Datasets Yes We use two binary classification datasets, UCI Adult benchmark dataset a8a (Platt, 1999) and web page classification dataset w8a (Dua & Graff, 2017)., and We use two large-scale movie recommendation datasets: Movie Lens20M (Harper & Konstan, 2015) and Netflix Prize dataset (Bennett et al., 2007).
Dataset Splits Yes For both a8a and w8a, we follow 80%/20% training/validation split. To create training/validation/test sets, we use the most recent rated item of each user for testing, the second recent item for validation, and the remaining items for training, which is widely-used in the literature (He et al., 2018; Wang et al., 2020).
Hardware Specification Yes This experiment is performed on a computing node with Intel Xeon 8352Y (Ice Lake) processor and 64GB memory.
Software Dependencies No The paper mentions software indirectly through the code link (e.g., Python or deep learning frameworks would be implied), but it does not specify any software names with version numbers, such as 'PyTorch 1.9' or 'CUDA 11.1'.
Experiment Setup Yes The regularization parameter λ is chosen from {0.00001, 0.0001, 0.001, 0.01}. For all methods, we tune the upper-level problem learning rate ηt from {0.001, 0.01, 0.1} and the lower-level problem learning rates τt, τt from {0.01, 0.1, 0.5, 1, 5, 10}. Parameters αt = αt and γt = γt in MSVR estimator are tuned from {0.5, 0.9, 1, 10, 100} and {0.001, 0.01, 0.1, 1, 10, 100} respectively. In RSVRB, the STORM parameter β is chosen from {0.1, 0.5, 0.9, 1}.