reproducibilityindex.ai

Fast Training Method for Stochastic Compositional Optimization Problems

Authors: Hongchang Gao, Heng Huang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	At last, we apply our decentralized training methods to the model-agnostic meta-learning problem, and the experimental results confirm the superior performance of our methods. 5 Experiment
Researcher Affiliation	Academia	1 Department of Computer and Information Sciences, Temple University, PA, USA 2 Department of Electrical and Computer Engineering, University of Pittsburgh, PA, USA
Pseudocode	Yes	Algorithm 1 Gossip-based Decentralized Stochastic Compositional Gradient Descent (GP-DSCGD)
Open Source Code	No	The paper does not provide an explicit statement about releasing source code for the described methodology or a link to a code repository.
Open Datasets	Yes	Model-Agnostic Meta-Learning (MAML) [6] is to learn a meta-initialization model parameter for a set of new tasks. Specifically, it is to optimize the following model: ... we use the Omniglot image dataset [12].
Dataset Splits	No	The paper specifies training and testing set sizes (e.g., 'the number of tasks in each meta-batch is set to 200' for training, and 'the number of tasks is set to 500' for testing in regression; '1200 tasks as the training set and the rest tasks as the testing set' for Omniglot), but it does not explicitly mention or quantify a separate 'validation' dataset split for hyperparameter tuning or early stopping.
Hardware Specification	No	The paper mentions using 'four GPUs' and 'eight GPUs' for experiments, but it does not specify the exact GPU models (e.g., NVIDIA A100, Tesla V100) or other hardware details like CPU, RAM, or specific cluster configurations.
Software Dependencies	No	The paper mentions using 'Adam' for the baseline method and 'Re LU' as an activation function, but it does not specify software dependencies like programming languages, frameworks (e.g., PyTorch, TensorFlow), or library versions (e.g., Python 3.8, CUDA 11.1).
Experiment Setup	Yes	In our experiment, we use four GPUs where each GPU is viewed as a device. On each device, the number of tasks in each meta-batch is set to 200. The number of samples in each task for training is set to 10. As for the testing set, the number of tasks is set to 500 and the number of samples in each task is also set to 10. In addition, the number of iterations for adaptation in the training phase is set to 1 while it is set to 10 in the testing phase. In our experiment, α is set to 0.01. For adaptation, the learning rate η of DSGD is set to η = 0.001. ... we also use the adaptive learning rate for our two methods based on the stochastic compositional gradient z(k) t and s(k) t . In addition, we set γ = 3.0, η = 0.03, β = 0.33.