Delay-Tolerant Algorithms for Asynchronous Distributed Online Learning

Authors: Brendan McMahan, Matthew Streeter

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We study the performance of both hypothetical algorithms and Adaptive Revision on two realworld medium-sized datasets. We simulate the update delays using an update queue, which allows us to implement the hypothetical algorithms and also lets us precisely control both the exact delays as well as the delay pattern. We compare to the dual-averaging Async Ada Grad algorithm of Duchi et al. [11] (Async Ada-DA in the figures), as well as asynchronous Ada Grad gradient descent (Async Ada-GD)... We evaluate on two datasets... For each of these datasets we trained a logistic regression model, and evaluated using the logistic loss (Log Loss).
Researcher Affiliation Industry H. Brendan Mc Mahan Google, Inc. Seattle, WA mcmahan@google.com Matthew Streeter Duolingo, Inc. Pittsburgh, PA matt@duolingo.com
Pseudocode Yes Pseudo-code for the algorithm as implemented for the experiments is given in Algorithm 1
Open Source Code No The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes The second is a shuffled version of the malicious URL dataset as described by Ma et al. [19] (2.4 106 examples, 3.2 106 features).6 We also ran experiments on the rcv1.binary training dataset (0.6 106 examples, 0.05 106 features) from Chang and Lin [20]; results were qualitatively very similar to those for the URL dataset.
Dataset Splits Yes We evaluate the models online, making a single pass over the data and computing accuracy metrics on the predictions made by the model immediately before it trained on each example (i.e., progressive validation). To avoid possible transient behavior, we only report metrics for the predictions on the second half of each dataset
Hardware Specification No The paper mentions running experiments on "distributed systems" with "many machines" but does not specify any particular hardware components like CPU or GPU models, or cloud computing instance types.
Software Dependencies No The paper mentions training a "logistic regression model" and using certain algorithms, but does not specify any software names with version numbers (e.g., Python, PyTorch, TensorFlow, specific libraries).
Experiment Setup Yes The exact parametrization of the learning rate schedule is particularly important with delayed updates. We follow the common practice of taking learning rates of the form ηt = α/ St + 1... When we optimize α, we choose the best setting from a grid {α0(1.25)i | i N}, where α0 is an initial guess for each dataset.