Hyperparameter optimization with approximate gradient
Authors: Fabian Pedregosa
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we validate the empirical performance of this method on the estimation of regularization constants of 2-regularized logistic regression and kernel Ridge regression. Empirical benchmarks indicate that our approach is highly competitive with respect to state of the art methods. |
| Researcher Affiliation | Academia | Fabian Pedregosa F@BIANP.NET Chaire Havas-Dauphine Economie des Nouvelles Donn ees CEREMADE, CNRS UMR 7534, Universit e Paris-Dauphine, PSL Research University D epartement Informatique de l Ecole Normale Sup erieure, Paris |
| Pseudocode | Yes | Algorithm 1 (HOAG). At iteration k = 1, 2, . . . perform the following: |
| Open Source Code | Yes | A Python implementation is made freely available at https://github.com/fabianp/hoag. |
| Open Datasets | Yes | The dataset 20news and real-sim are studied with an 2-regularized logistic regression model (1 hyperparameter) while the Parkinson dataset using a Kernel ridge regression model (2 hyperparameters). The MNIST dataset is investigated in a high-dimensional hyperparameter space using a similar setting to (Maclaurin et al., 2015, 3.2) and reported in in Appendix B. |
| Dataset Splits | Yes | In all cases, the dataset is randomly split in three equally sized parts: a train set, test set and a third validation set that we will use to measure the generalization performance of the different approaches. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions software like L-BFGS, scikit-learn, and Bayesian Optimization, but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | For all methods, the number of iterations used in the inner optimization algorithm (L-BFGS or GD) is set to 100... The initialization of regularization parameters is set to 0 and the width of an RBF kernel is initialized to log(n feat)... The initialization of the tolerance decrease sequence is set to ε1 = 0.1. We also limit the maximum precision to avoid numerical instabilities to 10 12... The constants that we used in the experiments are M = 1, α = 0.5, β = 1.05. |