SySCD: A System-Aware Parallel Coordinate Descent Algorithm
Authors: Nikolas Ioannou, Celestine Mendler-Dünner, Thomas Parnell
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the performance of Sy SCD on diverse datasets and across different CPU architectures, and we show that Sy SCD drastically improves the implementation efficiency and the scalability when compared to state-of-the-art GLM solvers (scikit-learn Pedregosa et al. (2011), Vowpal Wabbit Langford (2007), and H2O The H2O.ai team (2015)), resulting in 12 faster training on average. |
| Researcher Affiliation | Collaboration | Nikolas Ioannou IBM Research Zurich, Switzerland nio@zurich.ibm.com Celestine Mendler-Dünner UC Berkeley Berkeley, California mendler@berkeley.edu Thomas Parnell IBM Research Zurich, Switzerland tpa@zurich.ibm.com |
| Pseudocode | Yes | Algorithm 1 Sy SCD for minimizing (1) 1: Input: Training data matrix A = [x1, ..., xn] Rd n 2: Initialize model α and shared vector v = Pn i=1 αixi. 3: Partition coordinates into buckets of size B. 4: Partition buckets across numa nodes according to {Pk}K k=1. 5: for t = 1, 2, . . . , T1 do ... |
| Open Source Code | No | The paper does not provide concrete access to the source code for the Sy SCD methodology. Footnote 3 links to code for a baseline method (mini-batch SDCA), not the authors' own implementation. |
| Open Datasets | Yes | We evaluate on three datasets: (i) the sparse dataset released by Criteo Labs as part of their 2014 Kaggle competition (Criteo-Labs, 2013) (criteo-kaggle), (ii) the dense HIGGS dataset (Baldi et al., 2014) (higgs), and (iii) the dense epsilon dataset from the PASCAL Large Scale Learning Challenge (Epsilon, 2008) (epsilon). |
| Dataset Splits | No | The paper mentions using datasets for training and evaluation but does not provide specific details on how the datasets were split into training, validation, or test sets (e.g., percentages or sample counts). |
| Hardware Specification | Yes | We use two systems with different CPU architectures and numa topologies: a 4-node Intel Xeon (E5-4620) with 8 cores and 128Gi B of RAM in each node, and a 2-node IBM POWER9 with 20 cores and 512Gi B in each node, 1Ti B total. |
| Software Dependencies | Yes | We compare with scikit-learn (Pedregosa et al., 2011)(0.19.2), with H2O (The H2O.ai team, 2015) (3.20.0.8), and with VW (Langford, 2007) (commit: 5b020c4). |
| Experiment Setup | Yes | Remark 1 (Hyperparameters). The hyperparameters T2, T3, T4 in Alg 1 can be used to optimally tune Sy SCD to different CPU architectures. However, a good default choice is T4 = B, T3 = n PB T2 = 1 (4) such that one epoch (n coordinate updates) is performed across the threads before each synchronization step. We will use these values for all our experiments and did not further tune our method. Further, recall that the bucket size B is set to be equal to the cache line size of the CPU and the number of numa nodes K as well as the number of threads P is automatically detected. |