Quantized Decentralized Stochastic Learning over Directed Graphs

Authors: Hossein Taheri, Aryan Mokhtari, Hamed Hassani, Ramtin Pedarsani

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical evaluations corroborate our main theoretical results and illustrate significant speed-up compared to the exact-communication methods. Numerical Experiments In this section, we compare the proposed methods for communication-efficient message passing over directed graphs, with the push-sum protocol using exact communication (e.g., as formulated in (Kempe et al., 2003; Tsianos et al., 2012) for gossip or optimization problems).
Researcher Affiliation Academia 1Department of Electrical and Computer Engineering, University of California, Santa Barbara, Santa Barbara, CA, USA. 2Department of Electrical and Computer Engineering, University of Texas at Austin, Austin, TX, USA. 3Department of Electrical and Systems Engineering, University of Pennsylvania, Philadelphia, PA, USA.
Pseudocode Yes Algorithm 1 Quantized Push-sum for Consensus over Directed Graphs; Algorithm 2 Quantized Decentralized SGD over Directed Graphs
Open Source Code No The paper does not provide any specific links or explicit statements about the release of its source code.
Open Datasets Yes In order to illustrate this, we train a neural-network with 10 hidden units with sigmoid activation function to classify the MNIST dataset into 10 classes.
Dataset Splits No The paper mentions using training and test sets (implicitly for MNIST) but does not provide specific details about a validation set split.
Hardware Specification No The paper does not provide specific details about the hardware used for experiments (e.g., GPU/CPU models, memory).
Software Dependencies No The paper does not specify version numbers for any software dependencies (e.g., programming languages, libraries, frameworks).
Experiment Setup Yes The step size for each setting, is fine-tuned up to iteration 50 among 20 values in the interval [0.01, 3]. For each setting, the step-size is fine-tuned up to iteration 200 and over 15 values in the interval [0.1, 3]. We use the graph G1 with 10 nodes where each node has access to 1000 samples of data-set and uses a randomly selected mini-batch of size 10 for computing the local stochastic gradient descent.