Parallel and Distributed Block-Coordinate Frank-Wolfe Algorithms

Authors: Yu-Xiang Wang, Veeranjaneyulu Sadhanala, Wei Dai, Willie Neiswanger, Suvrit Sra, Eric Xing

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present experiments on structural SVM and Group Fused Lasso, and observe significant speedups over competing state-of-the-art (and synchronous) methods.
Researcher Affiliation Academia Yu-Xiang Wang YUXIANGW@CS.CMU.EDU Veeranjaneyulu Sadhanala VSADHANA@CS.CMU.EDU Wei Dai WDAI@CS.CMU.EDU Willie Neiswanger WILLIE@CS.CMU.EDU Suvrit Sra SUVRIT@MIT.EDU Eric P. Xing EPXING@CS.CMU.EDU Carnegie Mellon University, 5000 Forbes Ave, PA 15213, USA Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA 02139, USA
Pseudocode Yes Pseudocode of our scheme is given in Algorithm 1. Algorithm 1 AP-BCFW: Asynchronous Parallel Block Coordinate Frank-Wolfe (distributed)
Open Source Code No The paper does not provide any statement about making the source code available or a link to a code repository.
Open Datasets Yes In our simulation, we re-use the structural SVM setup from Lacoste-Julien et al. (2013) for a sequence labeling task on a subset of the OCR dataset (Taskar et al., 2004) (n = 6251, d = 4082).
Dataset Splits No The paper mentions using a 'subset of the OCR dataset' and a 'synthetic dataset' but does not provide specific details on how the data was split into training, validation, or test sets (e.g., percentages or sample counts).
Hardware Specification Yes All shared-memory experiments were implemented in C++ and conducted on a 16-core machine with Intel(R) Xeon(R) CPU E5-2450 2.10GHz processors and 128G RAM.
Software Dependencies No The paper mentions implementation in C++ but does not provide version numbers for any specific software libraries, frameworks, or dependencies.
Experiment Setup Yes We use λ = 1 with weighted averaging and line-search throughout (no delay is allowed). We use λ = 0.01 and a primal suboptimality threshold as our convergence criterion. We first fix the number of workers at T = 8 and vary the mini-batch size τ.