DoubleSqueeze: Parallel Stochastic Gradient Descent with Double-pass Error-Compensated Compression

Authors: Hanlin Tang, Chen Yu, Xiangru Lian, Tong Zhang, Ji Liu

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental An empirical study is also conducted to validate our theoretical results. ... 5. Experiments We validate our theory with experiments that compared DOUBLESQUEEZE with other compression implementations.
Researcher Affiliation Collaboration 1University of Rochester 2Hong Kong University of Science and Technology 3Seattle AI Lab, Fe DA Lab, Kwai Inc.
Pseudocode Yes Algorithm 1 DOUBLESQUEEZE
Open Source Code No The paper does not provide concrete access to source code for the methodology described, nor does it include a specific repository link or explicit code release statement.
Open Datasets Yes Datasets and models We evaluate DOUBLESQUEEZE by training Res Net-18 (He et al., 2016) on CIFAR-10.
Dataset Splits No The paper mentions training on CIFAR-10 and evaluating testing accuracy, but does not provide specific train/validation/test dataset splits, sample counts, or explicit cross-validation details for reproduction.
Hardware Specification Yes Each worker computes gradients on a Nvidia 1080Ti.
Software Dependencies No The paper mentions "TensorFlow" in its references, but does not specify the version of TensorFlow or any other key software components with their version numbers that are necessary to replicate the experiment.
Experiment Setup Yes The learning rate starts with 0.1 and is reduced by a factor of 10 every 160 epochs. The batch size is set to 256 on each worker.