reproducibilityindex.ai

The Convergence of Sparsified Gradient Methods

Authors: Dan Alistarh, Torsten Hoefler, Mikael Johansson, Nikola Konstantinov, Sarit Khirirat, Cedric Renggli

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate Assumption 1 experimentally on a number of different learning tasks in Section 6 (see also Figure 1).
Researcher Affiliation	Academia	Dan Alistarh IST Austria dan.alistarh@ist.ac.at Torsten Hoeﬂer ETH Zurich htor@inf.ethz.ch Mikael Johansson KTH mikaelj@kth.se Sarit Khirirat KTH sarit@kth.se Nikola Konstantinov IST Austria nikola.konstantinov@ist.ac.at Cédric Renggli ETH Zurich cedric.renggli@inf.ethz.ch
Pseudocode	Yes	Algorithm 1 Parallel Top K SGD at a node p. Input: Stochastic Gradient Oracle Gp( ) at node p Input: value K, learning rate Initialize v0 = p 0 = ~0 for each step t 1 do t (vt 1) {accumulate error into a locally generated gradient} p t Top K(accp t ) {update the error} Broadcast(Top K(accp t ), SUM) { broadcast to all nodes and receive from all nodes } gt 1 q=1 Top K(accq t) { average the received (sparse) gradients } vt vt 1 gt { apply the update } end for
Open Source Code	No	The paper provides a link to an arXiv preprint of the full version, but no explicit statement or link to source code for the described methodology.
Open Datasets	Yes	We validate Assumption 1 experimentally on a number of different learning tasks in Section 6 (see also Figure 1). Speciﬁcally, we sample gradients at different epochs during the training process, and bound the constant by comparing the left and right-hand sides of Equation (8). The assumption appears to hold with relatively low, stable values of the constant . We note that RCV1 is relatively sparse (average density ' 10%), while gradients in the other two settings are fully dense. (a) Empirical logistic/RCV1. (c) Empirical Res Net110. (b) Empirical synthetic.
Dataset Splits	No	The paper mentions using RCV1, synthetic data, and ResNet110 on CIFAR-10, but does not provide specific details on training, validation, or test dataset splits (e.g., percentages, sample counts, or citations to predefined splits) within the provided text.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions software like TensorFlow and MXNet in the related work, but does not specify any ancillary software dependencies with version numbers for their own experiments.
Experiment Setup	No	The paper states, "Exact descriptions of the experimental setup are given in the full version of the paper [5]", indicating these details are not in the provided text. It does not provide concrete hyperparameter values or detailed training configurations.