reproducibilityindex.ai

The Heavy-Tail Phenomenon in SGD

Authors: Mert Gurbuzbalaban, Umut Simsekli, Lingjiong Zhu

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we present our experimental results on both synthetic and real data, in order to illustrate that our theory also holds in ﬁnite-sum problems (besides the streaming setting). Our main goal will be to illustrate the tail behavior of SGD by varying the algorithm parameters: depending on the choice of the stepsize η and the batch-size b, the distribution of the iterates does converge to a heavy-tailed distribution (Theorem 2) and the behavior of the tail-index obeys Theorem 4. Our implementations can be found in github.com/umutsimsekli/sgd_ht.
Researcher Affiliation	Academia	1Department of Management Science and Information Systems, Rutgers Business School, Piscataway, USA 2 INRIA Département d Informatique de l École Normale Supérieure PSL Research University, Paris, France 3Department of Mathematics, Florida State University, Tallahassee, USA.
Pseudocode	No	The paper describes algorithms such as SGD (1.3) but does not provide them in a structured pseudocode block or a clearly labeled 'Algorithm' section.
Open Source Code	Yes	Our implementations can be found in github.com/umutsimsekli/sgd_ht.
Open Datasets	Yes	We train the models by using SGD ... on the MNIST and CIFAR10 datasets.
Dataset Splits	No	The paper mentions using well-known datasets like MNIST and CIFAR10, but it does not provide specific percentages or counts for training, validation, or test splits. It mentions 'K = 1000 and K0 = 500' for averaging iterates in synthetic experiments, which refers to iteration setup, not dataset splitting.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions 'implementations' are available on GitHub, implying software usage, but it does not specify any software names with version numbers (e.g., 'Python 3.8', 'PyTorch 1.9', 'CUDA 11.1').
Experiment Setup	Yes	We set d = 100 ﬁrst ﬁx the variances σ = 1, σx = σy = 3, and generate {ai, yi}n i=1 by simulating the statistical model. Then, by ﬁxing this dataset, we run the SGD recursion (3.5) for a large number of iterations and vary η from 0.02 to 0.2 and b from 1 to 20. We also set K = 1000 and K0 = 500. We train the models by using SGD for 10K iterations and we range η from 10 4 to 10 1 and b from 1 to 10. where we vary η from 10 4 to 1.7 10 3 and b from 1 to 10.