Efficient and Robust Automated Machine Learning

Authors: Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Springenberg, Manuel Blum, Frank Hutter

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our system won the first phase of the ongoing Cha Learn Auto ML challenge, and our comprehensive analysis on over 100 diverse datasets shows that it substantially outperforms the previous state of the art in Auto ML. We also demonstrate the performance gains due to each of our contributions and derive insights into the effectiveness of the individual components of AUTO-SKLEARN.
Researcher Affiliation Academia Matthias Feurer Aaron Klein Katharina Eggensperger Jost Tobias Springenberg Manuel Blum Frank Hutter Department of Computer Science University of Freiburg, Germany {feurerm,kleinaa,eggenspk,springj,mblum,fh}@cs.uni-freiburg.de
Pseudocode Yes Procedure 1 in the supplementary material describes it in detail.
Open Source Code Yes The source code of AUTO-SKLEARN is available under an open source license at https://github.com/automl/auto-sklearn.
Open Datasets Yes In an offline phase, for each machine learning dataset in a dataset repository (in our case 140 datasets from the Open ML [18] repository)...
Dataset Splits Yes Further, let Dtrain = {(x1, y1), . . . , (xn, yn)} be a training set which is split into K cross-validation folds {D(1) valid, . . . , D(K) valid} and {D(1) train, . . . , D(K) train} such that D(i) train = Dtrain\D(i) valid for i = 1, . . . , K.
Hardware Specification No The paper mentions 'CPU and/or wallclock time' for computational budget and '10.7 CPU years' for total experiment time, but it does not specify any particular CPU or GPU models, memory, or other hardware components used.
Software Dependencies No The paper mentions software frameworks like 'scikit-learn [7]', 'WEKA [8]', 'SMAC [9]', and 'Open ML [18]' but does not provide specific version numbers for these software dependencies as required for reproducibility.
Experiment Setup Yes To study their performance under rigid time constraints, and also due to computational resource constraints, we limited the CPU time for each run to 1 hour; we also limited the runtime for a single model to a tenth of this (6 minutes).