reproducibilityindex.ai

A Path to Simpler Models Starts With Noise

Authors: Lesia Semenova, Harry Chen, Ronald Parr, Cynthia Rudin

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To demonstrate our point, we computed the Rashomon ratio and pattern Rashomon ratio for 19 different datasets for hypothesis spaces of decision trees and linear models of different complexity (see Figure 2).Additionally, we introduce a measure called pattern diversity, which captures the average difference in predictions between distinct classification patterns in the Rashomon set, and motivate why it tends to increase with label noise. Our results explain a key aspect of why simpler models often tend to perform as well as black box models on complex, noisier datasets.
Researcher Affiliation	Academia	Lesia Semenova Harry Chen Ronald Parr Cynthia Rudin Department of Computer Science, Duke University {lesia.semenova,harry.chen084,ronald.parr,cynthia.rudin}@duke.edu
Pseudocode	Yes	Algorithm 1 Branch and bound approach to find the pattern Rashomon set
Open Source Code	No	The paper mentions using a third-party tool, Tree FARMS [52], but does not provide access to its own source code for the methodology described.
Open Datasets	Yes	Table 1: Preprocessed datasets
Dataset Splits	Yes	For each dataset, we performed five random splits into a train set and a validation set, where the validation set size is 20% of the number of samples. Then we performed 5-fold cross-validation on the training data to choose the best depth for CART.
Hardware Specification	No	We performed experiments on Duke University s Computer Science Department cluster.
Software Dependencies	No	To compute the numerator of the Rashomon ratio, we used Tree FARMS [52]. No version number is provided for Tree FARMS or any other software dependencies.
Experiment Setup	Yes	For the tree depth of CART, we considered the values d {1, . . . , m}, where m is the number of features for a given dataset. We considered six different noise levels, ρ {0, 0.03, 0.05, 0.10, 0.15, 0.20.0.25}. For every level, we performed 25 draws of Sρ. Then we performed 5-fold cross-validation on the training data to choose the best depth for CART.