Optimal Algorithms for Stochastic Multi-Level Compositional Optimization

Authors: Wei Jiang, Bokun Wang, Yibo Wang, Lijun Zhang, Tianbao Yang

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we conduct numerical experiments to evaluate the performance of the proposed method over three different tasks. We compare our method with existing multi-level algorithms, including A-TSCGD (Yang et al., 2019), NLASG (Balasubramanian et al., 2021), Nested SPIDER (Zhang & Xiao, 2021) and SCSC (Chen et al., 2021).
Researcher Affiliation Academia 1National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China 2Department of Computer Science, The University of Iowa, Iowa City, USA.
Pseudocode Yes Algorithm 1 SMVR; Algorithm 2 Stage-wise SMVR
Open Source Code No The paper does not provide any explicit statement or link for open-source code for the methodology described.
Open Datasets Yes In the experiment, we test different methods on real-world datasets Industry-10, Industry-12, Industry-17 and Industry-30 from Keneth R. French Data Library2. These datasets contain 10, 12, 17 and 30 industrial assets payoff over 25105 consecutive periods, respectively. ... We use the "HIV-1"3, "Australian"4, "Breast-cancer"4 and "svmguide1"4 dataset... Following Finn et al. (2017), we conduct experiments on 5-way 1-shot and 5-shot task on Omniglot dataset (Lake et al., 2011).
Dataset Splits No The paper mentions training samples for tasks (e.g., "1 or 5 training samples for each class" in the Omniglot experiment) but does not specify explicit train/validation/test splits for the overall datasets used.
Hardware Specification No The paper does not specify any hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper does not specify any software versions or dependencies (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes For our method, the parameter β is searched from the set {0.1, 0.5, 0.9}. For other methods, we choose the hyper-parameters suggested in the original papers or use grid search to select the best hyper-parameters. When it comes to the learning rate, we tune it from the range {0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 1}. As for the projection operation ΠLf , we can simply set Lf as a large value and we provide a sensitivity analysis in terms of tuning Lf in the first experiment. ... Following Zhang & Xiao (2021), we set parameter λ = 0.2. ... We set τ = 2, t = 10 according the origin paper and repeat each experiment 20 times. ... We conduct 5-step MAML and repeat each experiment 3 times.