Reshuffling Resampling Splits Can Improve Generalization of Hyperparameter Optimization

Authors: Thomas Nagler, Lennart Schneider, Bernd Bischl, Matthias Feurer

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our theoretical analysis explains how reshuffling affects the asymptotic behavior of the validation loss surface and provides a bound on the expected regret in the limiting regime. This bound connects the potential benefits of reshuffling to the signal and noise characteristics of the underlying optimization problem. We confirm our theoretical results in a controlled simulation study and demonstrate the practical usefulness of reshuffling in a large-scale, realistic hyperparameter optimization experiment.
Researcher Affiliation Academia Department of Statistics, LMU Munich Munich Center for Machine Learning (MCML)
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes We provide code to reproduce our results under an open source license at https://github.com/slds-lmu/paper_2024_reshuffling.
Open Datasets Yes We used a subset of the datasets defined by the Auto ML benchmark (Gijsbers et al., 2024), treating these as data generating processes (DGPs; Hothorn et al., 2005). Open ML Dataset ID Dataset Name Size (n p) 23517 numerai28.6 96320 21
Dataset Splits Yes We always use 80/20 train-validation splits for holdout and 5-fold CVs, so that training set size (and negative estimation bias) are the same. Additionally, for the random search, the 500 HPCs evaluated for a given learning algorithm are also fixed over different dataset and train_valid size combinations.
Hardware Specification Yes Benchmark experiments were run on an internal HPC cluster equipped with a mix of Intel Xeon E5-2670, Intel Xeon E5-2683 and Intel Xeon Gold 6330 instances.
Software Dependencies No I could not find specific version numbers for software dependencies (e.g., Python, PyTorch, scikit-learn, XGBoost, CatBoost, HEBO, SMAC3) used in the experiments.
Experiment Setup Yes We always use 80/20 train-validation splits for holdout and 5-fold CVs, so that training set size (and negative estimation bias) are the same. We conduct a random search with 500 HPC evaluations for every resampling strategy we described in Table 1, for both fixed and reshuffled splits. We provide details regarding training pipelines and search spaces in Appendix F.2.