reproducibilityindex.ai

Position: Amazing Things Come From Having Many Good Models

Authors: Cynthia Rudin, Chudi Zhong, Lesia Semenova, Margo Seltzer, Ronald Parr, Jiachang Liu, Srikar Katta, Jon Donnelly, Harry Chen, Zachery Boner

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We applied a variety of machine learning methods to the data, including boosted decision trees, random forest, multi-layer perceptrons, support vector machines, logistic regression, and a 2-layer additive risk model. All of these models have completely different functional forms, from linear models to kernel-based nonparametric models with smooth decision boundaries, to tree-based nonparametric models with sharp decision boundaries, yet most of these models perform comparably, as shown in Table 1.
Researcher Affiliation	Academia	1Department of Computer Science, Duke University, Durham, North Carolina, USA 2Department of Computer Science, University of British Columbia, Vancouver, Canada.
Pseudocode	No	The paper describes algorithms such as Tree FARMS, GAM Rashomon set, and Faster Risk, but does not include any explicit pseudocode blocks or labeled algorithm sections.
Open Source Code	No	The paper discusses and cites previously published algorithms and tools (e.g., Tree FARMS, GAMChanger, Fast Sparse) but does not provide a statement or link for source code from this specific paper.
Open Datasets	Yes	Let us work with a dataset the FICO dataset from the Explainable ML Challenge (FICO et al., 2018) though extremely similar results hold for an astounding number of other datasets (Semenova et al., 2022).
Dataset Splits	Yes	Table 1. Performance of different machine learning models on the 23-feature FICO dataset (Chen et al., 2022) over 10 test folds. They perform similarly.
Hardware Specification	No	The paper is a perspective piece and refers to experiments performed in other works, therefore, it does not specify hardware details for its own content.
Software Dependencies	No	The paper refers to various algorithms and tools (e.g., GOSDT, Fast Sparse, Tree FARMS, GAMChanger) but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup	No	The paper provides execution times for some models (e.g., 'obtained in 8.1 seconds by the GOSDT algorithm'), but does not explicitly detail hyperparameter values, training configurations, or other specific experimental setup details for its own content.