reproducibilityindex.ai

Learning meta-features for AutoML

Authors: Herilalaina Rakotoarison, Louisot Milijaona, Andry RASOANAIVO, Michele Sebag, Marc Schoenauer

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on the Open ML CC-18 benchmark demonstrate that using Meta Bu meta-features boosts the performance of state of the art Auto ML systems, Auto Sk Learn (Feurer et al. 2015) and Probabilistic Matrix Factorization (Fusi et al. 2018). Furthermore, the inspection of Meta Bu meta-features gives some hints into when an ML algorithm does well. Finally, the topology based on Meta Bu meta-features enables to estimate the intrinsic dimensionality of the Open ML benchmark w.r.t. a given ML algorithm or pipeline.
Researcher Affiliation	Academia	1 TAU, LISN-CNRS INRIA, Université Paris-Saclay, Orsay, France 2 MISA, LMI, Université d Antananarivo, Ankatso, Madagascar
Pseudocode	Yes	Algorithm 1: Learning METABU meta-features Algorithm 2: Fit_density
Open Source Code	Yes	The source code is available at https://github.com/luxusg1/metabu.
Open Datasets	Yes	The Open ML CC-18 (Bischl et al., 2019), to our knowledge the largest curated tabular dataset benchmark (that will be used in the experiments)
Dataset Splits	Yes	using the train/validation/test splits given by Open ML; the validation score is estimated using a 5-CV strategy.
Hardware Specification	Yes	Runtimes are measured on an Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz.
Software Dependencies	No	The paper mentions software like 'scikit-learn implementation', 'Config Space library', and 'Py MFE', but it does not provide specific version numbers for any of these software dependencies.
Experiment Setup	Yes	L = 20 in the experiments (for top-L known conﬁguration performances). the number of iterations is set to 10 in the experiments. ADAM optimizer (Kingma & Ba, 2015) with learning rate 0.01, α = 0.5 and λ = 0.001. The intrinsic dimension d of the Open ML benchmark is circa 6 for Auto Sk Learn, 8 for Adaboost, 9 for Random Forest and 14 for Support Vector Machines. For Auto Sk Learn, the target representation is generated from scratch, running 500 conﬁgurations per training dataset and retaining the top-20. For PMF, the top-20 conﬁgurations are extracted from the collaborative ﬁltering matrix for each training dataset (Fusi et al., 2018).