Communication-Efficient Distributed Optimization with Quantized Preconditioners
Authors: Foivos Alimisis, Peter Davies, Dan Alistarh
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also validate our findings experimentally, showing fast convergence and reduced communication. |
| Researcher Affiliation | Collaboration | 1Department of Mathematics, University of Geneva, Switzerland (work done while at IST Austria) 2IST Austria 3Neural Magic, US |
| Pseudocode | Yes | The algorithm is presented in a numbered list of steps under section 3.1 'The Algorithm' and 4.1 'Algorithm Description', formatted as structured steps for a method. |
| Open Source Code | No | The paper does not provide any statement or link indicating the availability of open-source code for the described methodology. |
| Open Datasets | Yes | Dataset We use the dataset cpusmall scale from LIBSVM (Chang & Lin, 2011). ... We demonstrate the methods on the phishing and german numer datasets from the LIBSVM collection (Chang & Lin, 2011) |
| Dataset Splits | No | The paper does not provide specific details about training, validation, or test dataset splits (e.g., percentages or sample counts). |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions using QSGD (Alistarh et al., 2016) and the Hadamard-rotation based method (Suresh et al., 2017) for gradient quantization but does not specify version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | The learning rate (lr in the figure titles) is set close to the maximum for which gradient descent will converge... The number of bits per coordinate used to quantize gradients (qb) and preconditioners (pb) are also shown; the latter is an average since the quantization method uses a variable number of bits. ... we test each with learning rates in {2^0, 2^-1, 2^-2, . . . }, and plot the highest rate for which the method stably converges. |