Gradient Sparsification for Communication-Efficient Distributed Optimization
Authors: Jianqiao Wangni, Jialei Wang, Ji Liu, Tong Zhang
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section we conduct experiments to validate the effectiveness and efficiency of the proposed sparsification technique. |
| Researcher Affiliation | Collaboration | Jianqiao Wangni University of Pennsylvania Tencent AI Lab wnjq@seas.upenn.edu Jialei Wang Two Sigma Investments jialei.wang@twosigma.com Ji Liu University of Rochester Tencent AI Lab ji.liu.uwisc@gmail.com Tong Zhang Tencent AI Lab tongzhang@tongzhang-ml.org |
| Pseudocode | Yes | Algorithm 1 A synchronous distributed optimization algorithm, Algorithm 2 Closed-form solution, Algorithm 3 Greedy algorithm |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. The provided link (https://arxiv.org/abs/1710.09854) is to a full version of the paper, not a code repository. |
| Open Datasets | Yes | We consider the convolutional neural networks (CNN) on the CIFAR-10 dataset with different settings. |
| Dataset Splits | No | The paper mentions using CIFAR-10, which has standard splits, and synthetic data, but does not explicitly provide specific percentages, sample counts, or citations to predefined splits for training, validation, or testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments, only mentioning a 'shared memory multi-thread' setup. |
| Software Dependencies | No | The paper mentions specific optimization algorithms like ADAM and SGD, but does not provide specific software dependencies with version numbers (e.g., PyTorch 1.9, TensorFlow 2.x, Python 3.x). |
| Experiment Setup | Yes | The mini-batch size is set to be 8 by default unless otherwise specified. and The step sizes are fine-tuned on each case... and The initial step size is set to 0.02. and The number of workers is set to 16 or 32, the regularization parameter is set to {0.5, 0.1, 0.05}, and the learning rate is chosen from {0.5, 0.25, 0.05, 0.25}. |