Blockwise Stochastic Variance-Reduced Methods with Parallel Speedup for Multi-Block Bilevel Optimization
Authors: Quanqi Hu, Zi-Hao Qiu, Zhishuai Guo, Lijun Zhang, Tianbao Yang
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also conduct experiments to verify the effectiveness of the proposed algorithms comparing with existing MBBO algorithms. We conduct experiments on both algorithms for lowdimensional and high-dimensional lower problems and demonstrate the effectiveness of our algorithms against existing algorithms of MBBO. |
| Researcher Affiliation | Academia | 1Department of Computer Science and Engineering, Texas A&M University, College Station, TX, USA 2National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China 3Most work of Z.H. Qiu was done when visiting the Opt MAI lab at TAMU. |
| Pseudocode | Yes | Algorithm 1 Blockwise Stochastic Variance-Reduced Bilevel Method (version 1): BSVRBv1 and Algorithm 2 Block-wise Stochastic Variance-Reduced Bilevel Method (version 2): BSVRBv2 |
| Open Source Code | Yes | The code for reproducing the experimental results in this section is available at https://github.com/Optimization-AI/ICML2023_BSVRB. |
| Open Datasets | Yes | We use two binary classification datasets, UCI Adult benchmark dataset a8a (Platt, 1999) and web page classification dataset w8a (Dua & Graff, 2017)., and We use two large-scale movie recommendation datasets: Movie Lens20M (Harper & Konstan, 2015) and Netflix Prize dataset (Bennett et al., 2007). |
| Dataset Splits | Yes | For both a8a and w8a, we follow 80%/20% training/validation split. To create training/validation/test sets, we use the most recent rated item of each user for testing, the second recent item for validation, and the remaining items for training, which is widely-used in the literature (He et al., 2018; Wang et al., 2020). |
| Hardware Specification | Yes | This experiment is performed on a computing node with Intel Xeon 8352Y (Ice Lake) processor and 64GB memory. |
| Software Dependencies | No | The paper mentions software indirectly through the code link (e.g., Python or deep learning frameworks would be implied), but it does not specify any software names with version numbers, such as 'PyTorch 1.9' or 'CUDA 11.1'. |
| Experiment Setup | Yes | The regularization parameter λ is chosen from {0.00001, 0.0001, 0.001, 0.01}. For all methods, we tune the upper-level problem learning rate ηt from {0.001, 0.01, 0.1} and the lower-level problem learning rates τt, τt from {0.01, 0.1, 0.5, 1, 5, 10}. Parameters αt = αt and γt = γt in MSVR estimator are tuned from {0.5, 0.9, 1, 10, 100} and {0.001, 0.01, 0.1, 1, 10, 100} respectively. In RSVRB, the STORM parameter β is chosen from {0.1, 0.5, 0.9, 1}. |