Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Explaining Hyperparameter Optimization via Partial Dependence Plots
Authors: Julia Moosbauer, Julia Herbinger, Giuseppe Casalicchio, Marius Lindauer, Bernd Bischl
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In an experimental study, we provide quantitative evidence for the increased quality of the PDPs within sub-regions. |
| Researcher Affiliation | Academia | Department of Statistics, Ludwig-Maximilians-University Munich, Munich, Germany Institute of Information Processing, Leibniz University Hannover, Hannover, Germany |
| Pseudocode | Yes | The pseudo-code to partition a hyperparameter (sub-)space and corresponding sample (λ(i)C )i2N 2 C, N {1, ..., n}, into two child regions is shown in Algorithm 1. |
| Open Source Code | Yes | The implementation of the proposed methods as well as reproducible scripts for the experimental analysis are provided in a public git-repository3. https://github.com/slds-lmu/paper_2021_xautoml |
| Open Datasets | Yes | LCBench data [Zimmer et al., 2021]. For each of the 35 different Open ML [Vanschoren et al., 2013] classification tasks, LCBench provides access to evaluations of a deep neural network on 2000 configurations randomly drawn from the configuration space defined by Auto-Py Torch Tabular (see Table 5 in Appendix C.2). |
| Dataset Splits | No | The paper states 'For each task, we trained a random forest as an empirical performance model that predicts the balanced validation error of the neural network for a given configuration' but does not provide specific details on the dataset split (e.g., percentages, sample counts, or methodology for creating the validation set) for their experiments. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU, memory, or specific computing cluster types) used to run the experiments. |
| Software Dependencies | No | The paper mentions several software components like 'Scikit-learn', 'mlrmbo', 'pdp', and 'Auto-Py Torch Tabular' and 'Python', but it does not specify their version numbers, which are necessary for reproducible software dependencies. |
| Experiment Setup | Yes | PDPs are computed with regards to single features for G = 20 equidistant grid points and n = 1000 Monte Carlo samples. We ran BO with a GP surrogate model with a Matérn-3/2 kernel and the LCB acquisition function a(λ) = ˆm(λ)+ ˆs(λ) with different values ∈ {0.1, 1, 5} to control the sampling bias. All computations were repeated 30 times. Each BO run was allotted a budget of 200 objective function evaluations. |