Position: Amazing Things Come From Having Many Good Models

Authors: Cynthia Rudin, Chudi Zhong, Lesia Semenova, Margo Seltzer, Ronald Parr, Jiachang Liu, Srikar Katta, Jon Donnelly, Harry Chen, Zachery Boner

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We applied a variety of machine learning methods to the data, including boosted decision trees, random forest, multi-layer perceptrons, support vector machines, logistic regression, and a 2-layer additive risk model. All of these models have completely different functional forms, from linear models to kernel-based nonparametric models with smooth decision boundaries, to tree-based nonparametric models with sharp decision boundaries, yet most of these models perform comparably, as shown in Table 1.
Researcher Affiliation Academia 1Department of Computer Science, Duke University, Durham, North Carolina, USA 2Department of Computer Science, University of British Columbia, Vancouver, Canada.
Pseudocode No The paper describes algorithms such as Tree FARMS, GAM Rashomon set, and Faster Risk, but does not include any explicit pseudocode blocks or labeled algorithm sections.
Open Source Code No The paper discusses and cites previously published algorithms and tools (e.g., Tree FARMS, GAMChanger, Fast Sparse) but does not provide a statement or link for source code from this specific paper.
Open Datasets Yes Let us work with a dataset the FICO dataset from the Explainable ML Challenge (FICO et al., 2018) though extremely similar results hold for an astounding number of other datasets (Semenova et al., 2022).
Dataset Splits Yes Table 1. Performance of different machine learning models on the 23-feature FICO dataset (Chen et al., 2022) over 10 test folds. They perform similarly.
Hardware Specification No The paper is a perspective piece and refers to experiments performed in other works, therefore, it does not specify hardware details for its own content.
Software Dependencies No The paper refers to various algorithms and tools (e.g., GOSDT, Fast Sparse, Tree FARMS, GAMChanger) but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup No The paper provides execution times for some models (e.g., 'obtained in 8.1 seconds by the GOSDT algorithm'), but does not explicitly detail hyperparameter values, training configurations, or other specific experimental setup details for its own content.