Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Optimal Learning Rates for Localized SVMs

Authors: Mona Meister, Ingo Steinwart

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we present a few larger scale experiments for our localized SVM showing that it achieves essentially the same test error as a global SVM for a fraction of the computational requirements. In addition, it turns out that the computational requirements for the local SVMs are similar to those of a vanilla random chunk approach, while the achieved test errors are signiﬁcantly better.
Researcher Affiliation	Collaboration	Mona Meister EMAIL Corporate Research Robert Bosch GmbH 70465 Stuttgart, Germany Ingo Steinwart EMAIL Institute for Stochastics and Applications University of Stuttgart 70569 Stuttgart, Germany
Pseudocode	Yes	Algorithm 1 Determine a Voronoi partition of the input data Require: Input data set DX = {x1, . . . , xn} with sample size n N and some radius r > 0. Ensure: Working sets indicating a Voronoi partition of DX.
Open Source Code	No	The code we used was an early version of Steinwart (2016), which provides highly eﬃcient SVM solvers for diﬀerent loss functions based on the ideas developed by (Steinwart et al., 2011). In particular, it is easy to repeat every experiment by the current version of the code. In order to prepare the data set for the experiments, we ﬁrst merged the split raw data sets so that we obtained one data set.
Open Datasets	Yes	In the experiments we report here, we consider the classical covtype data set, which contains 581.012 samples of dimension 54.
Dataset Splits	Yes	Finally, we generated random subsets that were afterwards randomly split into a training and a test data set. In this manner, we obtained training sets consisting of n = 1 000, 2 500, 5 000, 10 000, 25 000, 50 000, 100 000, 250 000, and 500 000 samples. The test data sets associated to the various training sets consist of ntest = 50 000 random samples, apart from the training sets with ntrain 5 000, for which we took ntest = 10 000 test samples. ... For each working set, we randomly split the respective training data set of size ntrain in ﬁve folds to apply 5-fold cross-validation in order to deal with the hyper-parameters λ and γ taken from an 10 by 10 grid geometrically generated in [0.001 n 1 train, 0.1] [0.5 n 1/d train , 10].
Hardware Specification	Yes	To train the global SVM for suﬃciently large data sets we used a professional compute server equipped with four INTEL XEON E7-4830 (2.13 GHz) 8-core processor, 256 GB RAM.
Software Dependencies	No	The code we used was an early version of Steinwart (2016), which provides highly eﬃcient SVM solvers for diﬀerent loss functions based on the ideas developed by (Steinwart et al., 2011).
Experiment Setup	Yes	For each working set, we randomly split the respective training data set of size ntrain in ﬁve folds to apply 5-fold cross-validation in order to deal with the hyper-parameters λ and γ taken from an 10 by 10 grid geometrically generated in [0.001 n 1 train, 0.1] [0.5 n 1/d train , 10].