Demystifying Why Local Aggregation Helps: Convergence Analysis of Hierarchical SGD
Authors: Jiayi Wang, Shiqiang Wang, Rong-Rong Chen, Mingyue Ji8548-8556
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also conduct experiments on CIFAR-10, FEMNIST, and Celeb A datasets. The results of experiments validate our theoretical results. |
| Researcher Affiliation | Collaboration | 1 Department of Electrical & Computer Engineering, University of Utah, Salt Lake City, UT, USA 2 IBM T. J. Watson Research Center, Yorktown Heights, NY, USA |
| Pseudocode | Yes | Algorithm 1: Hierarchical SGD (H-SGD) |
| Open Source Code | Yes | Our code is available at https://github.com/C3atUofU/Hierarchical-SGD.git. |
| Open Datasets | Yes | We validate our theoretical results with experiments on training the VGG-11 model over CIFAR-10 (Krizhevsky, Hinton et al. 2009) and Celeb A (Liu et al. 2015), and training a convolutional neural network (CNN) over FEMNIST (Cohen et al. 2017), all with non-IID data partitioning across workers. |
| Dataset Splits | No | The paper does not explicitly state the training, validation, and test splits (e.g., percentages or counts). It mentions 'non-IID data partitioning across workers' but not the specific splits used for evaluation. |
| Hardware Specification | No | The paper mentions running experiments 'on a single GPU' and using 'Amazon EC2 instances' but does not specify the exact models or configurations of these hardware components (e.g., specific GPU models like NVIDIA V100 or EC2 instance types like p3.8xlarge). |
| Software Dependencies | No | The paper does not explicitly list any specific software dependencies with version numbers (e.g., Python version, PyTorch version, or other libraries). |
| Experiment Setup | Yes | The paper describes key parameters used in the H-SGD setup (learning rate γ, global period G, local period I, number of groups N, total iterations T), and mentions specific values used in experiments (e.g., 'G = 50, I = 5', 'N = 2'). It also details how communication and computation times were measured, providing some aspects of the experimental setup. |