reproducibilityindex.ai

Fast, Accurate, and Simple Models for Tabular Data via Augmented Distillation

Authors: Rasool Fakoor, Jonas W. Mueller, Nick Erickson, Pratik Chaudhari, Alexander J. Smola

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate various methods on 30 datasets (Table S2) spanning regression tasks from the UCI ML Repository and binary/multi classiﬁcation tasks from Open ML, which are included in popular deep learning and Auto ML benchmarks [1, 36–40]. To facilitate comparisons on a meaningful scale across datasets, we evaluate methods on the provided test data based on either their accuracy in classiﬁcation, or percentage of variation explained (= R2 ⋅ 100) in regression.
Researcher Affiliation	Collaboration	Rasool Fakoor Amazon Web Services fakoor@amazon.com Jonas Mueller Amazon Web Services jonasmue@amazon.com Nick Erickson Amazon Web Services neerick@amazon.com Pratik Chaudhari University of Pennsylvania pratikac@seas.upenn.edu Alexander J. Smola Amazon Web Services smola@amazon.com
Pseudocode	No	The paper describes procedures in prose (e.g., 'We adopt the following procedure to draw Gibbs samples') but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code for the methodology or provide a link to a code repository.
Open Datasets	Yes	We evaluate various methods on 30 datasets (Table S2) spanning regression tasks from the UCI ML Repository and binary/multi classiﬁcation tasks from Open ML, which are included in popular deep learning and Auto ML benchmarks [1, 36–40].
Dataset Splits	Yes	The training data are split into training/validation folds (90-10), and only the training fold is used for augmentation (validation data keep their original labels for use in model/hyper-parameter selection and early-stopping).
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running experiments.
Software Dependencies	No	The paper mentions software tools like 'Auto Gluon', 'H2O-Auto ML', and 'Auto Sklearn' and model types such as 'Neural Network', 'Cat Boost', 'Light GBM', and 'Random Forest', but does not provide specific version numbers for any of these software dependencies.
Experiment Setup	No	While the paper mentions 'Auto Gluon is ﬁt to each training dataset for up to 4 hours with the auto_stack option' and that student models 'share the same hyper-parameters', it does not provide specific values for these hyperparameters (e.g., learning rate, batch size, number of epochs, optimizer settings) for the models trained in the experiments.