Hyperparameter optimization with approximate gradient

Authors: Fabian Pedregosa

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we validate the empirical performance of this method on the estimation of regularization constants of 2-regularized logistic regression and kernel Ridge regression. Empirical benchmarks indicate that our approach is highly competitive with respect to state of the art methods.
Researcher Affiliation Academia Fabian Pedregosa F@BIANP.NET Chaire Havas-Dauphine Economie des Nouvelles Donn ees CEREMADE, CNRS UMR 7534, Universit e Paris-Dauphine, PSL Research University D epartement Informatique de l Ecole Normale Sup erieure, Paris
Pseudocode Yes Algorithm 1 (HOAG). At iteration k = 1, 2, . . . perform the following:
Open Source Code Yes A Python implementation is made freely available at https://github.com/fabianp/hoag.
Open Datasets Yes The dataset 20news and real-sim are studied with an 2-regularized logistic regression model (1 hyperparameter) while the Parkinson dataset using a Kernel ridge regression model (2 hyperparameters). The MNIST dataset is investigated in a high-dimensional hyperparameter space using a similar setting to (Maclaurin et al., 2015, 3.2) and reported in in Appendix B.
Dataset Splits Yes In all cases, the dataset is randomly split in three equally sized parts: a train set, test set and a third validation set that we will use to measure the generalization performance of the different approaches.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions software like L-BFGS, scikit-learn, and Bayesian Optimization, but does not provide specific version numbers for these dependencies.
Experiment Setup Yes For all methods, the number of iterations used in the inner optimization algorithm (L-BFGS or GD) is set to 100... The initialization of regularization parameters is set to 0 and the width of an RBF kernel is initialized to log(n feat)... The initialization of the tolerance decrease sequence is set to ε1 = 0.1. We also limit the maximum precision to avoid numerical instabilities to 10 12... The constants that we used in the experiments are M = 1, α = 0.5, β = 1.05.