Pipeline Combinators for Gradual AutoML
Authors: Guillaume Baudart, Martin Hirzel, Kiran Kate, Parikshit Ram, Avi Shinnar, Jason Tsay
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This paper introduces Lale, an open-source sklearn-compatible Auto ML library, and evaluates it with a user study. |
| Researcher Affiliation | Collaboration | Guillaume Baudart Inria, ENS PSL University, France guillaume.baudart@inria.fr Martin Hirzel IBM Research, USA hirzel@us.ibm.com Kiran Kate IBM Research, USA kakate@us.ibm.com Parikshit Ram IBM Research, USA parikshit.ram@ibm.com Avraham Shinnar IBM Research, USA shinnar@us.ibm.com Jason Tsay IBM Research, USA jason.tsay@ibm.com |
| Pseudocode | No | The paper presents syntax definitions (Figure 1: Pipeline syntax, Figure 3: Schema syntax) and describes a translation scheme, but it does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | This paper introduces Lale, a Python Auto ML library implementing the combinators and the translation scheme. Lale enjoys active use both in the open-source community (https://github.com/ibm/lale/) |
| Open Datasets | Yes | We chose 14 datasets from Open ML [46] (CC-BY license) that allow for meaningful optimization (as opposed to just the initial few trials) within that 1-hour budget. |
| Dataset Splits | Yes | We used a 66:33% train:test split with 5-fold cross validation on the train set during optimization. |
| Hardware Specification | Yes | We used a 2.0GHz virtual machine with 32 cores and 128GB memory and gave each search a 1 hour time budget with a timeout of 6 minutes per trial, which corresponds to the default setting of auto-sklearn. |
| Software Dependencies | No | The paper mentions software like 'Lale', 'Python', 'sklearn', 'Hyperopt', 'ADMM', 'SMAC', and 'Hyberband', but it does not specify exact version numbers for these software dependencies, which are required for full reproducibility. |
| Experiment Setup | Yes | We used a 66:33% train:test split with 5-fold cross validation on the train set during optimization. We used a 2.0GHz virtual machine with 32 cores and 128GB memory and gave each search a 1 hour time budget with a timeout of 6 minutes per trial, which corresponds to the default setting of auto-sklearn. |