reproducibilityindex.ai

Pipeline Combinators for Gradual AutoML

Authors: Guillaume Baudart, Martin Hirzel, Kiran Kate, Parikshit Ram, Avi Shinnar, Jason Tsay

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This paper introduces Lale, an open-source sklearn-compatible Auto ML library, and evaluates it with a user study.
Researcher Affiliation	Collaboration	Guillaume Baudart Inria, ENS PSL University, France guillaume.baudart@inria.fr Martin Hirzel IBM Research, USA hirzel@us.ibm.com Kiran Kate IBM Research, USA kakate@us.ibm.com Parikshit Ram IBM Research, USA parikshit.ram@ibm.com Avraham Shinnar IBM Research, USA shinnar@us.ibm.com Jason Tsay IBM Research, USA jason.tsay@ibm.com
Pseudocode	No	The paper presents syntax definitions (Figure 1: Pipeline syntax, Figure 3: Schema syntax) and describes a translation scheme, but it does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	This paper introduces Lale, a Python Auto ML library implementing the combinators and the translation scheme. Lale enjoys active use both in the open-source community (https://github.com/ibm/lale/)
Open Datasets	Yes	We chose 14 datasets from Open ML [46] (CC-BY license) that allow for meaningful optimization (as opposed to just the initial few trials) within that 1-hour budget.
Dataset Splits	Yes	We used a 66:33% train:test split with 5-fold cross validation on the train set during optimization.
Hardware Specification	Yes	We used a 2.0GHz virtual machine with 32 cores and 128GB memory and gave each search a 1 hour time budget with a timeout of 6 minutes per trial, which corresponds to the default setting of auto-sklearn.
Software Dependencies	No	The paper mentions software like 'Lale', 'Python', 'sklearn', 'Hyperopt', 'ADMM', 'SMAC', and 'Hyberband', but it does not specify exact version numbers for these software dependencies, which are required for full reproducibility.
Experiment Setup	Yes	We used a 66:33% train:test split with 5-fold cross validation on the train set during optimization. We used a 2.0GHz virtual machine with 32 cores and 128GB memory and gave each search a 1 hour time budget with a timeout of 6 minutes per trial, which corresponds to the default setting of auto-sklearn.