Pipeline Combinators for Gradual AutoML

Authors: Guillaume Baudart, Martin Hirzel, Kiran Kate, Parikshit Ram, Avi Shinnar, Jason Tsay

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This paper introduces Lale, an open-source sklearn-compatible Auto ML library, and evaluates it with a user study.
Researcher Affiliation Collaboration Guillaume Baudart Inria, ENS PSL University, France guillaume.baudart@inria.fr Martin Hirzel IBM Research, USA hirzel@us.ibm.com Kiran Kate IBM Research, USA kakate@us.ibm.com Parikshit Ram IBM Research, USA parikshit.ram@ibm.com Avraham Shinnar IBM Research, USA shinnar@us.ibm.com Jason Tsay IBM Research, USA jason.tsay@ibm.com
Pseudocode No The paper presents syntax definitions (Figure 1: Pipeline syntax, Figure 3: Schema syntax) and describes a translation scheme, but it does not include structured pseudocode or algorithm blocks.
Open Source Code Yes This paper introduces Lale, a Python Auto ML library implementing the combinators and the translation scheme. Lale enjoys active use both in the open-source community (https://github.com/ibm/lale/)
Open Datasets Yes We chose 14 datasets from Open ML [46] (CC-BY license) that allow for meaningful optimization (as opposed to just the initial few trials) within that 1-hour budget.
Dataset Splits Yes We used a 66:33% train:test split with 5-fold cross validation on the train set during optimization.
Hardware Specification Yes We used a 2.0GHz virtual machine with 32 cores and 128GB memory and gave each search a 1 hour time budget with a timeout of 6 minutes per trial, which corresponds to the default setting of auto-sklearn.
Software Dependencies No The paper mentions software like 'Lale', 'Python', 'sklearn', 'Hyperopt', 'ADMM', 'SMAC', and 'Hyberband', but it does not specify exact version numbers for these software dependencies, which are required for full reproducibility.
Experiment Setup Yes We used a 66:33% train:test split with 5-fold cross validation on the train set during optimization. We used a 2.0GHz virtual machine with 32 cores and 128GB memory and gave each search a 1 hour time budget with a timeout of 6 minutes per trial, which corresponds to the default setting of auto-sklearn.