Escaping Saddle Points with Compressed SGD
Authors: Dmitrii Avdiukhin, Grigory Yaroslavtsev
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments, we show that noisy Compressed SGD achieves convergence comparable with full SGD and successfully escapes saddle points. We perform our first set of experiments on Res Net34 model trained using CIFAR-10 dataset with step size 0.1. We analyze convergence of compressed SGD with RANDOMK compressor when 100%, 10%, 1% and 0.1% random gradient coordinates are communicated. Figure 1 shows that SGD with RANDOMK with 10% or 1% of coordinates compression converges as fast as the full SGD, while requiring substantially smaller communication. |
| Researcher Affiliation | Academia | Dmitrii Avdiukhin Department of Computer Science Indiana University Bloomington, IN 47405 davdyukh@iu.edu Grigory Yaroslavtsev Department of Computer Science George Mason University Fairfax, VA 22030 grigory@grigory.us |
| Pseudocode | Yes | Algorithm 1: Compressed SGD |
| Open Source Code | No | The paper does not provide an unambiguous statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We perform our first set of experiments on Res Net34 model trained using CIFAR-10 dataset with step size 0.1. We compare uncompressed SGD, SGD with TOPK compressor (0.1% of coordinates), and SGD with RANDOMK compressor (0.1% of coordinates) on deep MNIST autoencoder. |
| Dataset Splits | No | The paper uses CIFAR-10 and MNIST datasets but does not explicitly provide specific training, validation, or test dataset split percentages or methodologies beyond implicitly using a test set. |
| Hardware Specification | No | The paper discusses distributed settings and mentions 'multiple machines' but does not specify any particular hardware details such as GPU models, CPU types, or cloud instance specifications used for the experiments. |
| Software Dependencies | No | The paper does not provide specific software names with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA) that would be needed to replicate the experiment. |
| Experiment Setup | Yes | We perform our first set of experiments on Res Net34 model trained using CIFAR-10 dataset with step size 0.1. We distribute the data across 10 machines, such that each machine contains data from a single class. Figure 1: Convergence of distributed SGD (η = 0.1, batch size is 8 per machine) with RANDOMK compressor... Figure 2: Convergence of SGD (η = 0.1, batch size is 64)... and with Gaussian noise (green, σ = 0.01 for each coordinate). |