Kernel Methods Through the Roof: Handling Billions of Points Efficiently
Authors: Giacomo Meanti, Luigi Carratino, Lorenzo Rosasco, Alessandro Rudi
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We ran a series of tests to evaluate the relative importance of the computational solutions we introduced, and then performed extensive comparisons on real-world datasets. The outcome of the first tests is given in Table 1 and is discussed in Appendix A.1 for brevity. In summary, it shows a 20 improvement over the base implementation of [43] which runs only partially on the GPU. Such improvement is visible in equal parts for the preconditioner computations, and for the iterative CG algorithm. For the second series of experiments we compared our implementation against three other software packages for GPU-accelerated kernel methods on several large scale datasets. |
| Researcher Affiliation | Academia | Giacomo Meanti Ma LGa, DIBRIS Universit`a degli Studi di Genova giacomo.meanti@edu.unige.it Luigi Carratino Ma LGa, DIBRIS Universit`a degli Studi di Genova luigi.carratino@dibris.unige.it Lorenzo Rosasco Ma LGa, DIBRIS, IIT & MIT Universit`a degli Studi di Genova lrosasco@mit.edu Alessandro Rudi INRIA École Normale Supérieure PSL Research University alessandro.rudi@inria.fr |
| Pseudocode | Yes | Algorithm 1 Pseudocode for the Falkon algorithm. |
| Open Source Code | Yes | Additionally, we make our software available as an easy to use library1. 1https://github.com/FalkonML/falkon |
| Open Datasets | Yes | We used eight datasets which we believe represent a broad set of possible scenarios for kernel learning in terms of data size, data type and task ranging from MSD with 5 105 points up to TAXI with 109 points and YELP with 107 sparse features. The characteristics of the datasets are shown in table 2 while a full description, along with details about preprocessing and relevant data splits, is available in appendix A.3. |
| Dataset Splits | Yes | Each experiment was run 5 times, varying the random train/test data split and the inducing points. Out of all possible experiments, we failed to run GPy Torch on TIMIT due to difficulties in setting up a multi-class benchmark (this is not a limitation of the software). Other experiments, such as Eigen Pro on several larger datasets, failed due to memory errors and others yet due to software limitations in handling sparse inputs (none of the examined implementations could run the sparse YELP dataset). Finally, Log Falkon only makes sense on binary classification datasets. |
| Hardware Specification | Yes | All experiments were run on a Dell Power Edge server with 2 Intel Xeon 4116 CPUs, 2 Titan Xp GPUs and 256GB of RAM. |
| Software Dependencies | No | The paper mentions software like PyTorch, TensorFlow, cuBLAS, cuSOLVER, cuSPARSE, and KeOps, but does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | Hyperparameters were optimized manually by training on a small data subset, to provide a sensible trade off between performance and accuracy: we increased the complexity of the different algorithms until they reached high GPU utilization since this is often the knee in the time-accuracy curve. Details on the GP likelihoods, optimization details and other settings used to run and tune the algorithms are in Appendix A.4. |