Regularization Path of Cross-Validation Error Lower Bounds
Authors: Atsushi Shibagaki, Yoshiki Suzuki, Masayuki Karasuyama, Ichiro Takeuchi
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our numerical experiments demonstrate that a theoretically guaranteed choice of a regularization parameter in the above sense is possible with reasonable computational costs. 5 Experiments In this section we present experiments for illustrating the proposed methods. Table 2 summarizes the datasets used in the experiments. They are taken from libsvm dataset repository [23]. All the input features except D9 and D10 were standardized to [ 1, 1]3. For illustrative results, the instances were randomly divided into a training and a validation sets in roughly equal sizes. For quantitative results, we used 10-fold CV. We used Huber hinge loss (e.g., [24]) which is convex and subdifferentiable with respect to the second argument. The proposed methods are free from the choice of optimization solvers. In the experiments, we used an optimization solver described in [25], which is also implemented in well-known liblinear software [26]. Our slightly modified code (for adaptation to Huber hinge loss) is provided as a supplementary material, and is also available on https://github.com/takeuchi-lab/RPCVELB. Whenever possible, we used warmstart approach, i.e., when we trained a new solution, we used the closest solutions trained so far (either approximate or optimal ones) as the initial starting point of the optimizer. All the computations were conducted by using a single core of an HP workstation Z800 (Xeon(R) CPU X5675 (3.07GHz), 48GB MEM). In all the experiments, we set Cℓ= 10 3 and Cu = 103. Results on problem 1 We applied Algorithm 1 in 4 to a set of solutions obtained by 1) gridsearch, 2) Bayesian optimization (BO) with expected improvement acquisition function, and 3) adaptive search with our framework which sequentially computes a solution whose validation lower bound is smallest based on the information obtained so far. Figure 2 illustrates the results on three datasets, where we see how the approximation level ε in the vertical axis changes as the number of solutions (T in our notation) increases. In grid-search, as we increase the grid points, the approximation level ε tends to be improved. Since BO tends to focus on a small region of the regularization parameter, it was difficult to tightly bound the approximation level. We see that the adaptive search, using our framework straightforwardly, seems to offer slight improvement from grid-search. Results on problem 2 We applied Algorithm 2 to benchmark datasets for demonstrating theoretically guaranteed choice of a regularization parameter is possible with reasonable computational costs. Besides the algorithm presented in 4, we also tested a variant described in supplementary Appendix B. Specifically, we have three algorithm options. In the first option (op1), we used optimal solutions {w Ct}t [T ] for computing CV error lower bounds. In the second option (op2), we instead used approximate solutions { ˆw Ct}t [T ]. In the last option (op3), we additionally used speed-up tricks described in supplementary Appendix B. We considered four different choices of ε {0.1, 0.05, 0.01, 0}. Note that ε = 0 indicates the task of finding the exactly optimal regular- Table 1: Computational costs. For each of the three options and ε {0.10, 0.05, 0.01, 0}, the number of optimization problems solved (denoted as T) and the total computational costs (denoted as time) are listed. |
| Researcher Affiliation | Academia | Atsushi Shibagaki, Yoshiki Suzuki, Masayuki Karasuyama, and Ichiro Takeuchi Nagoya Institute of Technology Nagoya, 466-8555, Japan {shibagaki.a.mllab.nit,suzuki.mllab.nit}@gmail.com {karasuyama,takeuchi.ichiro}@nitech.ac.jp |
| Pseudocode | Yes | Algorithm 1: Computing the approximation level ε from the given set of solutions Algorithm 2: Finding an ε approximate regularization parameter with approximate solutions |
| Open Source Code | Yes | Our slightly modified code (for adaptation to Huber hinge loss) is provided as a supplementary material, and is also available on https://github.com/takeuchi-lab/RPCVELB. |
| Open Datasets | Yes | Table 2: Benchmark datasets used in the experiments. dataset name sample size input dimension dataset name sample size input dimension D1 heart 270 13 D6 german.numer 1000 24 D2 liver-disorders 345 6 D7 svmguide3 1284 21 D3 ionosphere 351 34 D8 svmguide1 7089 4 D4 australian 690 14 D9 a1a 32561 123 D5 diabetes 768 8 D10 w8a 64700 300. They are taken from libsvm dataset repository [23]. |
| Dataset Splits | Yes | For illustrative results, the instances were randomly divided into a training and a validation sets in roughly equal sizes. For quantitative results, we used 10-fold CV. |
| Hardware Specification | Yes | All the computations were conducted by using a single core of an HP workstation Z800 (Xeon(R) CPU X5675 (3.07GHz), 48GB MEM). |
| Software Dependencies | No | The paper mentions 'well-known liblinear software [26]' but does not specify a version number for it or any other software. |
| Experiment Setup | Yes | In all the experiments, we set Cℓ= 10 3 and Cu = 103. We considered four different choices of ε {0.1, 0.05, 0.01, 0}. In trick1, we initially computed solutions at four different regularization parameter values evenly allocated in [10 3, 103] in the logarithmic scale. In trick2, the next regularization parameter Ct+1 was set by replacing ε in (10) with 1.5ε (see supplementary Appendix B). |