ANYTIME MINIBATCH: EXPLOITING STRAGGLERS IN ONLINE DISTRIBUTED OPTIMIZATION

Authors: Nuwan Ferdinand, Haider Al-Lawati, Stark Draper, Matthew Nokleby

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present a convergence analysis and analyze the wall time performance. Our numerical results show that our approach is up to 1.5 times faster in Amazon EC2 and it is up to five times faster when there is greater variability in compute node performance.To evaluate the performance of AMB and compare it with that of FMB, we ran several experiments on Amazon EC2 for both schemes to solve two different classes of machine learning tasks: linear regression and logistic regression using both synthetic and real datasets.
Researcher Affiliation Academia Nuwan Ferdinand, Haider Al-Lawati, & Stark Draper Department of Electrical and Computer Engineering, University of Toronto {nuwan.ferdinand@,haider.al.lawati@mail.,stark.draper@}utoronto.ca Matthew Nokleby Department of Electrical and Computer Engineering, Wayne State University, Detroit, MI matthew.nokleby@wayne.edu
Pseudocode Yes The pseudo code of the algorithm is provided in App. A. Algorithm 1 AMB Algorithm
Open Source Code No The paper does not provide concrete access to source code for the methodology described, such as a specific repository link, an explicit code release statement, or mention of code in supplementary materials.
Open Datasets Yes For the logistic regression problem, we used the MNIST images of numbers from 0 to 9. We used MNIST training dataset that consists of 60,000 data points.
Dataset Splits No The paper mentions using training datasets (e.g., MNIST training dataset) but does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) for training, validation, or testing.
Hardware Specification Yes In all our experiments, we used t2.micro instances and ami-6b211202, a publicly available Amazon Machine Image (AMI), to launch the instances.
Software Dependencies No Communication between nodes were handled through Message Passing Interface (MPI). No specific version number is provided for MPI or any other software dependency.
Experiment Setup Yes In FMB, each worker computed b = 6000 gradients. The average compute time during the steady-state phase was found to be 14.5 sec. Therefore, in AMB case, the compute time for each worker was set to be T = 14.5 sec. and we set Tc = 4.5 sec. Workers are allowed r = 5 average rounds of consensus to average their calculated gradients.The per-node fixed minibatch in FMB is b/n = 800 while the fixed compute time in AMB is T = 12 sec. and the communication time Tc = 3 sec. As in the linear regression experiment above, the workers on average go through r = 5 round of consensus.