GPU-Accelerated Primal Learning for Extremely Fast Large-Scale Classification

Authors: John T. Halloran, David M. Rocke

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental All experiments were run on a dual Intel Xeon Gold 5118 compute node with 48 computational threads, an NVIDIA Tesla V100 GPU, and 768 GB of memory. Speedups for sparse features. The TRON-LR GPU-optimized and mixed-architecture solvers (described in Section 4.2) are referred to as TRON-LR-GPU and TRON-LR-MIX, respectively.
Researcher Affiliation Academia John T. Halloran Department of Public Health Sciences University of California, Davis jthalloran@ucdavis.edu David M. Rocke Department of Public Health Sciences University of California, Davis dmrocke@ucdavis.edu
Pseudocode Yes Algorithm 1 The TRON algorithm
Open Source Code No The paper mentions that their implementations were "developed based on LIBLINEAR v2.30" but does not provide an explicit statement or link confirming that their specific GPU-optimized code is open-source or publicly available.
Open Datasets Yes Six datasets of varying statistics (i.e., number of features, instances, and nonzero elements) were downloaded from https://www.csie.ntu.edu.tw/ cjlin/libsvmtools/datasets/ and used to benchmark the TRON-LR solvers (statistics for each dataset are listed in [23]).
Dataset Splits Yes Furthermore, to prevent overfitting and improve generalizability within each iteration, three-fold cross-validation is carried out over three disjoint partitions of the original dataset, followed by further nested cross-validation within each fold [16].
Hardware Specification Yes All experiments were run on a dual Intel Xeon Gold 5118 compute node with 48 computational threads, an NVIDIA Tesla V100 GPU, and 768 GB of memory.
Software Dependencies Yes TRON-LR-GPU, TRON-LR-MIX, and TRON-LR-GPU0 were all developed based on LIBLINEAR v2.30. Single-threaded LIBLINEAR tests were run using v2.30. The multithread-optimized version of TRON-LR described in [36], referred to herein as TRON-LR-CPU, was tested using multi-core LIBLINEAR v2.30. The original Percolator SVM learning runtimes (collected using Percolator v3.04.0) were 14.4 hours and 4.4. days for the Kim and Wilhelm datasets, respectively.
Experiment Setup Yes All single-threaded TRON-LR implementations (i.e., TRON-LR-GPU0, TRONLR-GPU, and the single-threaded optimized version of TRON-LR in standard LIBLINEAR) were run with the same command line parameters: -c 4 -e 0.1 -s 0. Multithreaded implementations were run with the additional flag -nr i, specifying the use of i compute threads.