reproducibilityindex.ai

Regularization Path of Cross-Validation Error Lower Bounds

Authors: Atsushi Shibagaki, Yoshiki Suzuki, Masayuki Karasuyama, Ichiro Takeuchi

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our numerical experiments demonstrate that a theoretically guaranteed choice of a regularization parameter in the above sense is possible with reasonable computational costs. 5 Experiments In this section we present experiments for illustrating the proposed methods. Table 2 summarizes the datasets used in the experiments. They are taken from libsvm dataset repository [23]. All the input features except D9 and D10 were standardized to [ 1, 1]3. For illustrative results, the instances were randomly divided into a training and a validation sets in roughly equal sizes. For quantitative results, we used 10-fold CV. We used Huber hinge loss (e.g., [24]) which is convex and subdifferentiable with respect to the second argument. The proposed methods are free from the choice of optimization solvers. In the experiments, we used an optimization solver described in [25], which is also implemented in well-known liblinear software [26]. Our slightly modiﬁed code (for adaptation to Huber hinge loss) is provided as a supplementary material, and is also available on https://github.com/takeuchi-lab/RPCVELB. Whenever possible, we used warmstart approach, i.e., when we trained a new solution, we used the closest solutions trained so far (either approximate or optimal ones) as the initial starting point of the optimizer. All the computations were conducted by using a single core of an HP workstation Z800 (Xeon(R) CPU X5675 (3.07GHz), 48GB MEM). In all the experiments, we set Cℓ= 10 3 and Cu = 103. Results on problem 1 We applied Algorithm 1 in 4 to a set of solutions obtained by 1) gridsearch, 2) Bayesian optimization (BO) with expected improvement acquisition function, and 3) adaptive search with our framework which sequentially computes a solution whose validation lower bound is smallest based on the information obtained so far. Figure 2 illustrates the results on three datasets, where we see how the approximation level ε in the vertical axis changes as the number of solutions (T in our notation) increases. In grid-search, as we increase the grid points, the approximation level ε tends to be improved. Since BO tends to focus on a small region of the regularization parameter, it was difﬁcult to tightly bound the approximation level. We see that the adaptive search, using our framework straightforwardly, seems to offer slight improvement from grid-search. Results on problem 2 We applied Algorithm 2 to benchmark datasets for demonstrating theoretically guaranteed choice of a regularization parameter is possible with reasonable computational costs. Besides the algorithm presented in 4, we also tested a variant described in supplementary Appendix B. Speciﬁcally, we have three algorithm options. In the ﬁrst option (op1), we used optimal solutions {w Ct}t [T ] for computing CV error lower bounds. In the second option (op2), we instead used approximate solutions { ˆw Ct}t [T ]. In the last option (op3), we additionally used speed-up tricks described in supplementary Appendix B. We considered four different choices of ε {0.1, 0.05, 0.01, 0}. Note that ε = 0 indicates the task of ﬁnding the exactly optimal regular- Table 1: Computational costs. For each of the three options and ε {0.10, 0.05, 0.01, 0}, the number of optimization problems solved (denoted as T) and the total computational costs (denoted as time) are listed.
Researcher Affiliation	Academia	Atsushi Shibagaki, Yoshiki Suzuki, Masayuki Karasuyama, and Ichiro Takeuchi Nagoya Institute of Technology Nagoya, 466-8555, Japan {shibagaki.a.mllab.nit,suzuki.mllab.nit}@gmail.com {karasuyama,takeuchi.ichiro}@nitech.ac.jp
Pseudocode	Yes	Algorithm 1: Computing the approximation level ε from the given set of solutions Algorithm 2: Finding an ε approximate regularization parameter with approximate solutions
Open Source Code	Yes	Our slightly modiﬁed code (for adaptation to Huber hinge loss) is provided as a supplementary material, and is also available on https://github.com/takeuchi-lab/RPCVELB.
Open Datasets	Yes	Table 2: Benchmark datasets used in the experiments. dataset name sample size input dimension dataset name sample size input dimension D1 heart 270 13 D6 german.numer 1000 24 D2 liver-disorders 345 6 D7 svmguide3 1284 21 D3 ionosphere 351 34 D8 svmguide1 7089 4 D4 australian 690 14 D9 a1a 32561 123 D5 diabetes 768 8 D10 w8a 64700 300. They are taken from libsvm dataset repository [23].
Dataset Splits	Yes	For illustrative results, the instances were randomly divided into a training and a validation sets in roughly equal sizes. For quantitative results, we used 10-fold CV.
Hardware Specification	Yes	All the computations were conducted by using a single core of an HP workstation Z800 (Xeon(R) CPU X5675 (3.07GHz), 48GB MEM).
Software Dependencies	No	The paper mentions 'well-known liblinear software [26]' but does not specify a version number for it or any other software.
Experiment Setup	Yes	In all the experiments, we set Cℓ= 10 3 and Cu = 103. We considered four different choices of ε {0.1, 0.05, 0.01, 0}. In trick1, we initially computed solutions at four different regularization parameter values evenly allocated in [10 3, 103] in the logarithmic scale. In trick2, the next regularization parameter Ct+1 was set by replacing ε in (10) with 1.5ε (see supplementary Appendix B).