On the Convergence of Local Stochastic Compositional Gradient Descent with Momentum
Authors: Hongchang Gao, Junyi Li, Heng Huang
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Additionally, extensive experimental results demonstrate the superior empirical performance over existing methods, confirming the efficacy of our method. 5. Numerical Experiments |
| Researcher Affiliation | Academia | 1Department of Computer and Information Sciences, Temple University, PA, USA. 2Department of Electrical and Computer Engineering, University of Pittsburgh, PA, USA. |
| Pseudocode | Yes | Algorithm 1 Local-SCGDM |
| Open Source Code | No | The paper does not provide any statement about making its source code publicly available or a link to a code repository. |
| Open Datasets | Yes | We evaluate our proposed algorithm Local SCGDM over a 1-D sinusoid regression problem... and the Few-Shot Classification task over the Omniglot dataset. |
| Dataset Splits | Yes | we construct 25 different training tasks by choosing A = {1, 2, 3, 4, 5} and b = {1, 2, 3, 4, 5} and randomly and evenly distribute them over 5 clients. Then during training, we randomly sample 3 tasks for every client per meta-iteration. For each task we choose K = 10 samples of x [ 5, 5] randomly. We follow the experimental protocols of Vinyals et al. (2016) to divide the alphabets to train/validation/test with 33/5/12, respectively. for each task, we sample K samples for training and 15 samples for validation. |
| Hardware Specification | Yes | All experiments are run over a machine with Intel Xeon Gold 6248 CPU and 4 Nvidia Tesla V100 GPUs. |
| Software Dependencies | No | The paper mentions "Pytorch" and "Pytorch.distributed package" but does not specify version numbers for these software components. |
| Experiment Setup | Yes | The inner learning rate is 0.001 for all methods. For other hyper-parameters, we perform grid search for all methods and choose the setting with the best results. More precisely, for Local-BSGD (Local-MAML), we choose meta learning rate 0.01; for Local-SCGD, we choose meta learning rate 0.01 and the inner state momentum coefficient 0.9 (this algorithm diverges with smaller values); for Local-MOML, we choose meta learning rate 0.01, inner state momentum coefficient 0.7; for our Local-SCGDM, we choose η as 1, meta learning rate coefficient β as 0.01, meta momentum coefficient α as 0.8 and inner state momentum coefficient γ as 0.7. We set the number of local epochs as 5 in all comparison experiments. |