Parallel and Distributed Block-Coordinate Frank-Wolfe Algorithms
Authors: Yu-Xiang Wang, Veeranjaneyulu Sadhanala, Wei Dai, Willie Neiswanger, Suvrit Sra, Eric Xing
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present experiments on structural SVM and Group Fused Lasso, and observe significant speedups over competing state-of-the-art (and synchronous) methods. |
| Researcher Affiliation | Academia | Yu-Xiang Wang YUXIANGW@CS.CMU.EDU Veeranjaneyulu Sadhanala VSADHANA@CS.CMU.EDU Wei Dai WDAI@CS.CMU.EDU Willie Neiswanger WILLIE@CS.CMU.EDU Suvrit Sra SUVRIT@MIT.EDU Eric P. Xing EPXING@CS.CMU.EDU Carnegie Mellon University, 5000 Forbes Ave, PA 15213, USA Massachusetts Institute of Technology, 77 Massachusetts Ave, Cambridge, MA 02139, USA |
| Pseudocode | Yes | Pseudocode of our scheme is given in Algorithm 1. Algorithm 1 AP-BCFW: Asynchronous Parallel Block Coordinate Frank-Wolfe (distributed) |
| Open Source Code | No | The paper does not provide any statement about making the source code available or a link to a code repository. |
| Open Datasets | Yes | In our simulation, we re-use the structural SVM setup from Lacoste-Julien et al. (2013) for a sequence labeling task on a subset of the OCR dataset (Taskar et al., 2004) (n = 6251, d = 4082). |
| Dataset Splits | No | The paper mentions using a 'subset of the OCR dataset' and a 'synthetic dataset' but does not provide specific details on how the data was split into training, validation, or test sets (e.g., percentages or sample counts). |
| Hardware Specification | Yes | All shared-memory experiments were implemented in C++ and conducted on a 16-core machine with Intel(R) Xeon(R) CPU E5-2450 2.10GHz processors and 128G RAM. |
| Software Dependencies | No | The paper mentions implementation in C++ but does not provide version numbers for any specific software libraries, frameworks, or dependencies. |
| Experiment Setup | Yes | We use λ = 1 with weighted averaging and line-search throughout (no delay is allowed). We use λ = 0.01 and a primal suboptimality threshold as our convergence criterion. We first fix the number of workers at T = 8 and vary the mini-batch size τ. |