Automated Machine Learning with Monte-Carlo Tree Search

Authors: Herilalaina Rakotoarison, Marc Schoenauer, Michèle Sebag

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive empirical studies are conducted to independently assess and compare: i) the optimization processes based on Bayesian optimization or MCTS; ii) its warm-start initialization; iii) the ensembling of the solutions gathered along the search. MOSAIC is assessed on the Open ML 100 benchmark and the Scikit-learn portfolio, with statistically significant gains over AUTO-SKLEARN, winner of former international Auto ML challenges.
Researcher Affiliation Academia Herilalaina Rakotoarison , Marc Schoenauer and Mich ele Sebag TAU, LRI-CNRS INRIA Universit e Paris-Saclay, France
Pseudocode Yes Algorithm 1 MOSAIC Vanilla
Open Source Code Yes 1MOSAIC is publicly available under an open source license at https://github.com/herilalaina/mosaic_ml.
Open Datasets Yes The compared Auto ML systems are assessed on the Open ML repository [Vanschoren et al., 2013], including 100 classification problems.
Dataset Splits Yes For all systems, every considered x configuration is launched to learn a model from 70% of the training set with a cut-off time of 300 seconds, and performance F(x) is set to the model accuracy on the remaining 30%.
Hardware Specification Yes Computational times are measured on an AMD Athlon 64 X2, 5GB RAM.
Software Dependencies No The paper mentions using a "scikit-learn portfolio" and comparing against other systems like AUTO-SKLEARN and TPOT, but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes The overall computational budget is set to 1 hour for each dataset. ...MOSAIC involves 2 hyper-hyper-parameters...: the number ns = 100... Cucb = 1.3... PW = 0.6. Shared hyper-hyperparameters include: number nr of uniformly sampled configurations and variance ǫ = .2 for the local search in the Playout phase (Section 3.3). ...every considered x configuration is launched to learn a model from 70% of the training set with a cut-off time of 300 seconds, and performance F(x) is set to the model accuracy on the remaining 30%.