Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Optimal Learning Rates for Localized SVMs
Authors: Mona Meister, Ingo Steinwart
JMLR 2016 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we present a few larger scale experiments for our localized SVM showing that it achieves essentially the same test error as a global SVM for a fraction of the computational requirements. In addition, it turns out that the computational requirements for the local SVMs are similar to those of a vanilla random chunk approach, while the achieved test errors are significantly better. |
| Researcher Affiliation | Collaboration | Mona Meister EMAIL Corporate Research Robert Bosch GmbH 70465 Stuttgart, Germany Ingo Steinwart EMAIL Institute for Stochastics and Applications University of Stuttgart 70569 Stuttgart, Germany |
| Pseudocode | Yes | Algorithm 1 Determine a Voronoi partition of the input data Require: Input data set DX = {x1, . . . , xn} with sample size n N and some radius r > 0. Ensure: Working sets indicating a Voronoi partition of DX. |
| Open Source Code | No | The code we used was an early version of Steinwart (2016), which provides highly efficient SVM solvers for different loss functions based on the ideas developed by (Steinwart et al., 2011). In particular, it is easy to repeat every experiment by the current version of the code. In order to prepare the data set for the experiments, we first merged the split raw data sets so that we obtained one data set. |
| Open Datasets | Yes | In the experiments we report here, we consider the classical covtype data set, which contains 581.012 samples of dimension 54. |
| Dataset Splits | Yes | Finally, we generated random subsets that were afterwards randomly split into a training and a test data set. In this manner, we obtained training sets consisting of n = 1 000, 2 500, 5 000, 10 000, 25 000, 50 000, 100 000, 250 000, and 500 000 samples. The test data sets associated to the various training sets consist of ntest = 50 000 random samples, apart from the training sets with ntrain 5 000, for which we took ntest = 10 000 test samples. ... For each working set, we randomly split the respective training data set of size ntrain in five folds to apply 5-fold cross-validation in order to deal with the hyper-parameters λ and γ taken from an 10 by 10 grid geometrically generated in [0.001 n 1 train, 0.1] [0.5 n 1/d train , 10]. |
| Hardware Specification | Yes | To train the global SVM for sufficiently large data sets we used a professional compute server equipped with four INTEL XEON E7-4830 (2.13 GHz) 8-core processor, 256 GB RAM. |
| Software Dependencies | No | The code we used was an early version of Steinwart (2016), which provides highly efficient SVM solvers for different loss functions based on the ideas developed by (Steinwart et al., 2011). |
| Experiment Setup | Yes | For each working set, we randomly split the respective training data set of size ntrain in five folds to apply 5-fold cross-validation in order to deal with the hyper-parameters λ and γ taken from an 10 by 10 grid geometrically generated in [0.001 n 1 train, 0.1] [0.5 n 1/d train , 10]. |