reproducibilityindex.ai

Demystifying Why Local Aggregation Helps: Convergence Analysis of Hierarchical SGD

Authors: Jiayi Wang, Shiqiang Wang, Rong-Rong Chen, Mingyue Ji8548-8556

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also conduct experiments on CIFAR-10, FEMNIST, and Celeb A datasets. The results of experiments validate our theoretical results.
Researcher Affiliation	Collaboration	1 Department of Electrical & Computer Engineering, University of Utah, Salt Lake City, UT, USA 2 IBM T. J. Watson Research Center, Yorktown Heights, NY, USA
Pseudocode	Yes	Algorithm 1: Hierarchical SGD (H-SGD)
Open Source Code	Yes	Our code is available at https://github.com/C3atUofU/Hierarchical-SGD.git.
Open Datasets	Yes	We validate our theoretical results with experiments on training the VGG-11 model over CIFAR-10 (Krizhevsky, Hinton et al. 2009) and Celeb A (Liu et al. 2015), and training a convolutional neural network (CNN) over FEMNIST (Cohen et al. 2017), all with non-IID data partitioning across workers.
Dataset Splits	No	The paper does not explicitly state the training, validation, and test splits (e.g., percentages or counts). It mentions 'non-IID data partitioning across workers' but not the specific splits used for evaluation.
Hardware Specification	No	The paper mentions running experiments 'on a single GPU' and using 'Amazon EC2 instances' but does not specify the exact models or configurations of these hardware components (e.g., specific GPU models like NVIDIA V100 or EC2 instance types like p3.8xlarge).
Software Dependencies	No	The paper does not explicitly list any specific software dependencies with version numbers (e.g., Python version, PyTorch version, or other libraries).
Experiment Setup	Yes	The paper describes key parameters used in the H-SGD setup (learning rate γ, global period G, local period I, number of groups N, total iterations T), and mentions specific values used in experiments (e.g., 'G = 50, I = 5', 'N = 2'). It also details how communication and computation times were measured, providing some aspects of the experimental setup.