reproducibilityindex.ai

SySCD: A System-Aware Parallel Coordinate Descent Algorithm

Authors: Nikolas Ioannou, Celestine Mendler-Dünner, Thomas Parnell

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the performance of Sy SCD on diverse datasets and across different CPU architectures, and we show that Sy SCD drastically improves the implementation efﬁciency and the scalability when compared to state-of-the-art GLM solvers (scikit-learn Pedregosa et al. (2011), Vowpal Wabbit Langford (2007), and H2O The H2O.ai team (2015)), resulting in 12 faster training on average.
Researcher Affiliation	Collaboration	Nikolas Ioannou IBM Research Zurich, Switzerland nio@zurich.ibm.com Celestine Mendler-Dünner UC Berkeley Berkeley, California mendler@berkeley.edu Thomas Parnell IBM Research Zurich, Switzerland tpa@zurich.ibm.com
Pseudocode	Yes	Algorithm 1 Sy SCD for minimizing (1) 1: Input: Training data matrix A = [x1, ..., xn] Rd n 2: Initialize model α and shared vector v = Pn i=1 αixi. 3: Partition coordinates into buckets of size B. 4: Partition buckets across numa nodes according to {Pk}K k=1. 5: for t = 1, 2, . . . , T1 do ...
Open Source Code	No	The paper does not provide concrete access to the source code for the Sy SCD methodology. Footnote 3 links to code for a baseline method (mini-batch SDCA), not the authors' own implementation.
Open Datasets	Yes	We evaluate on three datasets: (i) the sparse dataset released by Criteo Labs as part of their 2014 Kaggle competition (Criteo-Labs, 2013) (criteo-kaggle), (ii) the dense HIGGS dataset (Baldi et al., 2014) (higgs), and (iii) the dense epsilon dataset from the PASCAL Large Scale Learning Challenge (Epsilon, 2008) (epsilon).
Dataset Splits	No	The paper mentions using datasets for training and evaluation but does not provide specific details on how the datasets were split into training, validation, or test sets (e.g., percentages or sample counts).
Hardware Specification	Yes	We use two systems with different CPU architectures and numa topologies: a 4-node Intel Xeon (E5-4620) with 8 cores and 128Gi B of RAM in each node, and a 2-node IBM POWER9 with 20 cores and 512Gi B in each node, 1Ti B total.
Software Dependencies	Yes	We compare with scikit-learn (Pedregosa et al., 2011)(0.19.2), with H2O (The H2O.ai team, 2015) (3.20.0.8), and with VW (Langford, 2007) (commit: 5b020c4).
Experiment Setup	Yes	Remark 1 (Hyperparameters). The hyperparameters T2, T3, T4 in Alg 1 can be used to optimally tune Sy SCD to different CPU architectures. However, a good default choice is T4 = B, T3 = n PB T2 = 1 (4) such that one epoch (n coordinate updates) is performed across the threads before each synchronization step. We will use these values for all our experiments and did not further tune our method. Further, recall that the bucket size B is set to be equal to the cache line size of the CPU and the number of numa nodes K as well as the number of threads P is automatically detected.