Error Compensated Quantized SGD and its Applications to Large-scale Distributed Optimization
Authors: Jiaxiang Wu, Weidong Huang, Junzhou Huang, Tong Zhang
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments indicate that our algorithm can compress gradients by a factor of up to two magnitudes without performance degradation. |
| Researcher Affiliation | Industry | 1Tencent AI Lab, Shenzhen, China. Correspondence to: Jiaxiang Wu <jonathanwu@tencent.com>. |
| Pseudocode | Yes | We summarize the overall workflow in Algorithm 1. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described in the paper, nor does it state that the code is released. |
| Open Datasets | Yes | Here we start with three synthetic datasets: Syn-256, Syn512, and Syn-1024. Each dataset consists of 10k training samples... Furthermore, we extend the evaluation to two publicly available datasets, Year Prediction MSD for regression and gisette for classification (Chang & Lin, 2011). |
| Dataset Splits | Yes | The experiments are carried out on the CIFAR-10 dataset (Krizhevsky, 2009)... We follow the common protocol, using 50k images for training and the remaining 10k images for evaluation. |
| Hardware Specification | Yes | Major hardware specifications are as follows: Intel Xeon E5-2680 CPU, Nvidia Tesla P40 GPU (8 units per node), and Mellanox Connect X-3 Pro network card (40Gb/s connectivity). |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment. |
| Experiment Setup | Yes | For all methods, the batch size is set to 128, and the learning rate starts from 0.1, divided by 10 at 40k and 60k iterations. The training process is terminated at the end of the 200-th epoch ( 78k iterations). |