Fast Training Method for Stochastic Compositional Optimization Problems
Authors: Hongchang Gao, Heng Huang
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | At last, we apply our decentralized training methods to the model-agnostic meta-learning problem, and the experimental results confirm the superior performance of our methods. 5 Experiment |
| Researcher Affiliation | Academia | 1 Department of Computer and Information Sciences, Temple University, PA, USA 2 Department of Electrical and Computer Engineering, University of Pittsburgh, PA, USA |
| Pseudocode | Yes | Algorithm 1 Gossip-based Decentralized Stochastic Compositional Gradient Descent (GP-DSCGD) |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code for the described methodology or a link to a code repository. |
| Open Datasets | Yes | Model-Agnostic Meta-Learning (MAML) [6] is to learn a meta-initialization model parameter for a set of new tasks. Specifically, it is to optimize the following model: ... we use the Omniglot image dataset [12]. |
| Dataset Splits | No | The paper specifies training and testing set sizes (e.g., 'the number of tasks in each meta-batch is set to 200' for training, and 'the number of tasks is set to 500' for testing in regression; '1200 tasks as the training set and the rest tasks as the testing set' for Omniglot), but it does not explicitly mention or quantify a separate 'validation' dataset split for hyperparameter tuning or early stopping. |
| Hardware Specification | No | The paper mentions using 'four GPUs' and 'eight GPUs' for experiments, but it does not specify the exact GPU models (e.g., NVIDIA A100, Tesla V100) or other hardware details like CPU, RAM, or specific cluster configurations. |
| Software Dependencies | No | The paper mentions using 'Adam' for the baseline method and 'Re LU' as an activation function, but it does not specify software dependencies like programming languages, frameworks (e.g., PyTorch, TensorFlow), or library versions (e.g., Python 3.8, CUDA 11.1). |
| Experiment Setup | Yes | In our experiment, we use four GPUs where each GPU is viewed as a device. On each device, the number of tasks in each meta-batch is set to 200. The number of samples in each task for training is set to 10. As for the testing set, the number of tasks is set to 500 and the number of samples in each task is also set to 10. In addition, the number of iterations for adaptation in the training phase is set to 1 while it is set to 10 in the testing phase. In our experiment, α is set to 0.01. For adaptation, the learning rate η of DSGD is set to η = 0.001. ... we also use the adaptive learning rate for our two methods based on the stochastic compositional gradient z(k) t and s(k) t . In addition, we set γ = 3.0, η = 0.03, β = 0.33. |