Demystifying Why Local Aggregation Helps: Convergence Analysis of Hierarchical SGD

Authors: Jiayi Wang, Shiqiang Wang, Rong-Rong Chen, Mingyue Ji8548-8556

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also conduct experiments on CIFAR-10, FEMNIST, and Celeb A datasets. The results of experiments validate our theoretical results.
Researcher Affiliation Collaboration 1 Department of Electrical & Computer Engineering, University of Utah, Salt Lake City, UT, USA 2 IBM T. J. Watson Research Center, Yorktown Heights, NY, USA
Pseudocode Yes Algorithm 1: Hierarchical SGD (H-SGD)
Open Source Code Yes Our code is available at https://github.com/C3atUofU/Hierarchical-SGD.git.
Open Datasets Yes We validate our theoretical results with experiments on training the VGG-11 model over CIFAR-10 (Krizhevsky, Hinton et al. 2009) and Celeb A (Liu et al. 2015), and training a convolutional neural network (CNN) over FEMNIST (Cohen et al. 2017), all with non-IID data partitioning across workers.
Dataset Splits No The paper does not explicitly state the training, validation, and test splits (e.g., percentages or counts). It mentions 'non-IID data partitioning across workers' but not the specific splits used for evaluation.
Hardware Specification No The paper mentions running experiments 'on a single GPU' and using 'Amazon EC2 instances' but does not specify the exact models or configurations of these hardware components (e.g., specific GPU models like NVIDIA V100 or EC2 instance types like p3.8xlarge).
Software Dependencies No The paper does not explicitly list any specific software dependencies with version numbers (e.g., Python version, PyTorch version, or other libraries).
Experiment Setup Yes The paper describes key parameters used in the H-SGD setup (learning rate γ, global period G, local period I, number of groups N, total iterations T), and mentions specific values used in experiments (e.g., 'G = 50, I = 5', 'N = 2'). It also details how communication and computation times were measured, providing some aspects of the experimental setup.