The Convergence of Sparsified Gradient Methods
Authors: Dan Alistarh, Torsten Hoefler, Mikael Johansson, Nikola Konstantinov, Sarit Khirirat, Cedric Renggli
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate Assumption 1 experimentally on a number of different learning tasks in Section 6 (see also Figure 1). |
| Researcher Affiliation | Academia | Dan Alistarh IST Austria dan.alistarh@ist.ac.at Torsten Hoefler ETH Zurich htor@inf.ethz.ch Mikael Johansson KTH mikaelj@kth.se Sarit Khirirat KTH sarit@kth.se Nikola Konstantinov IST Austria nikola.konstantinov@ist.ac.at Cédric Renggli ETH Zurich cedric.renggli@inf.ethz.ch |
| Pseudocode | Yes | Algorithm 1 Parallel Top K SGD at a node p. Input: Stochastic Gradient Oracle Gp( ) at node p Input: value K, learning rate Initialize v0 = p 0 = ~0 for each step t 1 do t (vt 1) {accumulate error into a locally generated gradient} p t Top K(accp t ) {update the error} Broadcast(Top K(accp t ), SUM) { broadcast to all nodes and receive from all nodes } gt 1 q=1 Top K(accq t) { average the received (sparse) gradients } vt vt 1 gt { apply the update } end for |
| Open Source Code | No | The paper provides a link to an arXiv preprint of the full version, but no explicit statement or link to source code for the described methodology. |
| Open Datasets | Yes | We validate Assumption 1 experimentally on a number of different learning tasks in Section 6 (see also Figure 1). Specifically, we sample gradients at different epochs during the training process, and bound the constant by comparing the left and right-hand sides of Equation (8). The assumption appears to hold with relatively low, stable values of the constant . We note that RCV1 is relatively sparse (average density ' 10%), while gradients in the other two settings are fully dense. (a) Empirical logistic/RCV1. (c) Empirical Res Net110. (b) Empirical synthetic. |
| Dataset Splits | No | The paper mentions using RCV1, synthetic data, and ResNet110 on CIFAR-10, but does not provide specific details on training, validation, or test dataset splits (e.g., percentages, sample counts, or citations to predefined splits) within the provided text. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions software like TensorFlow and MXNet in the related work, but does not specify any ancillary software dependencies with version numbers for their own experiments. |
| Experiment Setup | No | The paper states, "Exact descriptions of the experimental setup are given in the full version of the paper [5]", indicating these details are not in the provided text. It does not provide concrete hyperparameter values or detailed training configurations. |