Double Quantization for Communication-Efficient Distributed Optimization
Authors: Yue Yu, Jiaxiang Wu, Longbo Huang
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments to validate the efficiency of our algorithms. We start with the logistic regression problem and then evaluate the performance of our algorithms on neural network models. We further study the relationship of hyperparameter µ and number of transmitted bits. |
| Researcher Affiliation | Collaboration | Yue Yu IIIS, Tsinghua University yu-y14@mails.tsinghua.edu.cn Jiaxiang Wu Tencent AI Lab jonathanwu@tencent.com Longbo Huang IIIS, Tsinghua University longbohuang@tsinghua.edu.cn |
| Pseudocode | Yes | Algorithm 1 Asy LPG, Algorithm 2 Sparse-Asy LPG: Procedures for worker, Algorithm 3 Acc-Asy LPG |
| Open Source Code | No | The paper does not provide any explicit statements about the availability of source code for the methodology described, nor does it include links to a code repository. |
| Open Datasets | Yes | We begin with logistic regression on dataset real-sim [7]. We experimented logistic regression on dataset rcv1 [7]. We conduct evaluations on dataset MNIST [18] using a 3-layer fully connected neural network. We further set up experiments on Py Torch with Res Net18 [11] on CIFAR10 dataset [16]. |
| Dataset Splits | No | The paper mentions '10k training and 2k test samples' for MNIST, and '50k training samples and 10k evaluation samples' for CIFAR10, but it does not explicitly state a separate 'validation' dataset split. |
| Hardware Specification | Yes | The evaluations are setup on a 6-server distributed test-bed. Each server has 16 cores and 16GB memory. |
| Software Dependencies | No | The paper mentions 'Open MPI' and 'Py Torch' but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | We use L1, L2 regularization with weights 10^-5 and 10^-4, respectively. The mini-batch size B = 200 and epoch length m = n/B. The following six algorithms are compared, using a constant learning rate (denoted as lr) tuned to achieve the best result from {1e-1, 1e-2, 5e-2, 1e-3, 5e-3, ..., 1e-5, 5e-5}. We set bx = 8 and b = 8 in these three algorithms. The sparsity budget in Sparse-Asy LPG is selected as ϕt = ||αt||1/||αt||. Parameters in Acc-Asy LPG are set to be θs = 2/(s + 2) and ηs = lr/θs. |