Learning meta-features for AutoML
Authors: Herilalaina Rakotoarison, Louisot Milijaona, Andry RASOANAIVO, Michele Sebag, Marc Schoenauer
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on the Open ML CC-18 benchmark demonstrate that using Meta Bu meta-features boosts the performance of state of the art Auto ML systems, Auto Sk Learn (Feurer et al. 2015) and Probabilistic Matrix Factorization (Fusi et al. 2018). Furthermore, the inspection of Meta Bu meta-features gives some hints into when an ML algorithm does well. Finally, the topology based on Meta Bu meta-features enables to estimate the intrinsic dimensionality of the Open ML benchmark w.r.t. a given ML algorithm or pipeline. |
| Researcher Affiliation | Academia | 1 TAU, LISN-CNRS INRIA, Université Paris-Saclay, Orsay, France 2 MISA, LMI, Université d Antananarivo, Ankatso, Madagascar |
| Pseudocode | Yes | Algorithm 1: Learning METABU meta-features Algorithm 2: Fit_density |
| Open Source Code | Yes | The source code is available at https://github.com/luxusg1/metabu. |
| Open Datasets | Yes | The Open ML CC-18 (Bischl et al., 2019), to our knowledge the largest curated tabular dataset benchmark (that will be used in the experiments) |
| Dataset Splits | Yes | using the train/validation/test splits given by Open ML; the validation score is estimated using a 5-CV strategy. |
| Hardware Specification | Yes | Runtimes are measured on an Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz. |
| Software Dependencies | No | The paper mentions software like 'scikit-learn implementation', 'Config Space library', and 'Py MFE', but it does not provide specific version numbers for any of these software dependencies. |
| Experiment Setup | Yes | L = 20 in the experiments (for top-L known configuration performances). the number of iterations is set to 10 in the experiments. ADAM optimizer (Kingma & Ba, 2015) with learning rate 0.01, α = 0.5 and λ = 0.001. The intrinsic dimension d of the Open ML benchmark is circa 6 for Auto Sk Learn, 8 for Adaboost, 9 for Random Forest and 14 for Support Vector Machines. For Auto Sk Learn, the target representation is generated from scratch, running 500 configurations per training dataset and retaining the top-20. For PMF, the top-20 configurations are extracted from the collaborative filtering matrix for each training dataset (Fusi et al., 2018). |