Fast Training Method for Stochastic Compositional Optimization Problems

Authors: Hongchang Gao, Heng Huang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental At last, we apply our decentralized training methods to the model-agnostic meta-learning problem, and the experimental results confirm the superior performance of our methods. 5 Experiment
Researcher Affiliation Academia 1 Department of Computer and Information Sciences, Temple University, PA, USA 2 Department of Electrical and Computer Engineering, University of Pittsburgh, PA, USA
Pseudocode Yes Algorithm 1 Gossip-based Decentralized Stochastic Compositional Gradient Descent (GP-DSCGD)
Open Source Code No The paper does not provide an explicit statement about releasing source code for the described methodology or a link to a code repository.
Open Datasets Yes Model-Agnostic Meta-Learning (MAML) [6] is to learn a meta-initialization model parameter for a set of new tasks. Specifically, it is to optimize the following model: ... we use the Omniglot image dataset [12].
Dataset Splits No The paper specifies training and testing set sizes (e.g., 'the number of tasks in each meta-batch is set to 200' for training, and 'the number of tasks is set to 500' for testing in regression; '1200 tasks as the training set and the rest tasks as the testing set' for Omniglot), but it does not explicitly mention or quantify a separate 'validation' dataset split for hyperparameter tuning or early stopping.
Hardware Specification No The paper mentions using 'four GPUs' and 'eight GPUs' for experiments, but it does not specify the exact GPU models (e.g., NVIDIA A100, Tesla V100) or other hardware details like CPU, RAM, or specific cluster configurations.
Software Dependencies No The paper mentions using 'Adam' for the baseline method and 'Re LU' as an activation function, but it does not specify software dependencies like programming languages, frameworks (e.g., PyTorch, TensorFlow), or library versions (e.g., Python 3.8, CUDA 11.1).
Experiment Setup Yes In our experiment, we use four GPUs where each GPU is viewed as a device. On each device, the number of tasks in each meta-batch is set to 200. The number of samples in each task for training is set to 10. As for the testing set, the number of tasks is set to 500 and the number of samples in each task is also set to 10. In addition, the number of iterations for adaptation in the training phase is set to 1 while it is set to 10 in the testing phase. In our experiment, α is set to 0.01. For adaptation, the learning rate η of DSGD is set to η = 0.001. ... we also use the adaptive learning rate for our two methods based on the stochastic compositional gradient z(k) t and s(k) t . In addition, we set γ = 3.0, η = 0.03, β = 0.33.