Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Auto-Sklearn 2.0: Hands-free AutoML via Meta-Learning
Authors: Matthias Feurer, Katharina Eggensperger, Stefan Falkner, Marius Lindauer, Frank Hutter
JMLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We verify the improvements by these additions in an extensive experimental study on 39 Auto ML benchmark datasets. |
| Researcher Affiliation | Collaboration | Matthias Feurer1 EMAIL Katharina Eggensperger1 EMAIL Stefan Falkner2 EMAIL Marius Lindauer3 EMAIL Frank Hutter1,2 EMAIL 1Department of Computer Science, Albert-Ludwigs-Universit at Freiburg 2Bosch Center for Articial Intelligence, Renningen, Germany 3Institute of Information Processing, Leibniz University Hannover |
| Pseudocode | Yes | Appendix A. Additional pseudo-code We give pseudo-code for computing the estimated generalization error of P across all metadatasets Dmeta for K-fold cross-validation in Algorithm 2 and successive halving in Algorithm 3. Algorithm 2: Estimating the generalization error of a portfolio with K-Fold Cross Validation ... Algorithm 3: Estimating the generalization error of a portfolio with Successive Halving |
| Open Source Code | Yes | We provide scripts for reproducing all our experimental results at https://github.com/automl/ASKL2.0_experiments and provide a clean integration of our methods into the ocial Auto-sklearn repository. |
| Open Datasets | Yes | For Dtest, we rely on 39 datasets selected for the Auto ML benchmark proposed by Gijsbers et al. (2019), which consists of datasets for comparing classiers (Bischl et al., 2021) and datasets from the Auto ML challenges (Guyon et al., 2019). We collected the meta datasets Dmeta based on Open ML (Vanschoren et al., 2014) using the Open ML-Python API (Feurer et al., 2021). |
| Dataset Splits | Yes | For all datasets, we use a single holdout test set of 33.33%, which is dened by the corresponding Open ML task. The remaining 66.66% are the training data of our Auto ML systems, which handle further splits for model selection themselves based on the chosen model selection strategy. ... We used the pre-dened 1h8c setting, which divides each dataset into ten folds and gives each framework one hour on eight CPU cores to produce a nal model. |
| Hardware Specification | Yes | All experiments were conducted on a compute cluster with machines equipped with 2 Intel Xeon Gold 6242 CPUs with 2.8GHz (32 cores) and 192 GB RAM, running Ubuntu 20.04.01. |
| Software Dependencies | Yes | We implemented the Auto ML systems and experiments in the Python3 programming language, using numpy (Harris et al., 2020), scipy (Virtanen et al., 2020), scikit-learn (Pedregosa et al., 2011), pandas (Wes Mc Kinney, 2010; Reback et al., 2021), and matplotlib (Hunter, 2007). We used version 0.12.6 of the Auto-sklearn Python package for the experiments and added Auto-sklearn 2.0 functionality in version 0.12.7 which we then used for the Auto ML benchmark. ... Table 17: Package Versions: Auto-sklearn 2.0 0.12.7, Auto-sklearn 1.0 0.12.6, Auto-WEKA 2.6.3, TPOT 0.11.7, H2O Auto ML 3.32.1.4, Tuned Random Forest 0.24.2, Auto ML benchmark 973de79617e68a881dcc640842ea1d21dfd4b36c |
| Experiment Setup | Yes | We always report results averaged across 10 repetitions to account for randomness and report the mean and standard deviation over these repetitions. ... We conducted all experiments using ensemble selection, and we constructed ensembles of size 50 with replacement. ... We also limit the time and memory for each ML pipeline evaluation. For the time limit, we allow for at most 1/10 of the optimization budget, while for the memory, we allow the pipeline 4GB before forcefully terminating the execution. ... We used the same hyperparameters for all experiments. First, we set to eta = 4. Next, we had to choose the minimal and maximal budgets assigned to each algorithm. For the treebased methods we chose to go from 32 to 512, while for the linear models (SGD and passive aggressive) we chose 64 as the minimal budget and 1024 as the maximal budget. ... Table 18: Conguration space for Auto-sklearn 2.0 using only iterative models and only preprocessing to transform data into a format that can be usefully employed by the dierent classication algorithms. The nal column (log) states whether we actually search log10(λ). |