Straggler Mitigation in Distributed Optimization Through Data Encoding

Authors: Can Karakus, Yifan Sun, Suhas Diggavi, Wotao Yin

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide experimental results demonstrating the advantage of the approach over uncoded and data replication strategies.
Researcher Affiliation Collaboration Can Karakus UCLA Los Angeles, CA karakus@ucla.edu Yifan Sun Technicolor Research Los Altos, CA Yifan.Sun@technicolor.com Suhas Diggavi UCLA Los Angeles, CA suhasdiggavi@ucla.edu Wotao Yin UCLA Los Angeles, CA wotaoyin@math.ucla.edu
Pseudocode No The paper describes algorithms textually but does not include structured pseudocode or an algorithm block.
Open Source Code No The paper refers to an arXiv preprint but does not provide an explicit statement or link to the source code for the described methodology.
Open Datasets Yes Matrix factorization on Movielens 1-M dataset [18] for the movie recommendation task.
Dataset Splits Yes We withhold randomly 20% of these ratings to form an 80/20 train/test split.
Hardware Specification Yes We implement distributed L-BFGS as described in Section 3 on an Amazon EC2 cluster using the mpi4py Python package, over m = 32 m1.small worker node instances, and a single c3.8xlarge central server instance. The Movielens experiment is run on a single 32-core machine with 256 GB RAM.
Software Dependencies No The paper mentions 'mpi4py Python package' and 'using the built-in function numpy.linalg.solve' but does not provide specific version numbers for software dependencies.
Experiment Setup Yes for regularization parameter λ = 0.05. We evaluate column-subsampled Hadamard matrix with redundancy β = 2 (encoded using FWHT for fast encoding)... which are aggregated over 20 trials. We choose µ = 3, p = 15, and λ = 10...