reproducibilityindex.ai

Escaping Saddle Points with Compressed SGD

Authors: Dmitrii Avdiukhin, Grigory Yaroslavtsev

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our experiments, we show that noisy Compressed SGD achieves convergence comparable with full SGD and successfully escapes saddle points. We perform our ﬁrst set of experiments on Res Net34 model trained using CIFAR-10 dataset with step size 0.1. We analyze convergence of compressed SGD with RANDOMK compressor when 100%, 10%, 1% and 0.1% random gradient coordinates are communicated. Figure 1 shows that SGD with RANDOMK with 10% or 1% of coordinates compression converges as fast as the full SGD, while requiring substantially smaller communication.
Researcher Affiliation	Academia	Dmitrii Avdiukhin Department of Computer Science Indiana University Bloomington, IN 47405 davdyukh@iu.edu Grigory Yaroslavtsev Department of Computer Science George Mason University Fairfax, VA 22030 grigory@grigory.us
Pseudocode	Yes	Algorithm 1: Compressed SGD
Open Source Code	No	The paper does not provide an unambiguous statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	We perform our ﬁrst set of experiments on Res Net34 model trained using CIFAR-10 dataset with step size 0.1. We compare uncompressed SGD, SGD with TOPK compressor (0.1% of coordinates), and SGD with RANDOMK compressor (0.1% of coordinates) on deep MNIST autoencoder.
Dataset Splits	No	The paper uses CIFAR-10 and MNIST datasets but does not explicitly provide specific training, validation, or test dataset split percentages or methodologies beyond implicitly using a test set.
Hardware Specification	No	The paper discusses distributed settings and mentions 'multiple machines' but does not specify any particular hardware details such as GPU models, CPU types, or cloud instance specifications used for the experiments.
Software Dependencies	No	The paper does not provide specific software names with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA) that would be needed to replicate the experiment.
Experiment Setup	Yes	We perform our ﬁrst set of experiments on Res Net34 model trained using CIFAR-10 dataset with step size 0.1. We distribute the data across 10 machines, such that each machine contains data from a single class. Figure 1: Convergence of distributed SGD (η = 0.1, batch size is 8 per machine) with RANDOMK compressor... Figure 2: Convergence of SGD (η = 0.1, batch size is 64)... and with Gaussian noise (green, σ = 0.01 for each coordinate).