reproducibilityindex.ai

Exploring the Whole Rashomon Set of Sparse Decision Trees

Authors: Rui Xin, Chudi Zhong, Zhi Chen, Takuya Takagi, Margo Seltzer, Cynthia Rudin

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our evaluation answers the following questions: 1. How does Tree FARMS compare to baseline methods for searching the hypothesis space? ( 6.1), 2. How quickly can we find the entire Rashomon set? ( 6.1), 3. What does the Rashomon set look like? What can we learn about its structure? ( G.2), 4. What does MCR look like for real datasets? ( 6.2), 5. How do balanced accuracy and F1-score Rashomon sets compare to the accuracy Rashomon set? ( 6.3), and 6. How does removing samples affect the Rashomon set? ( 6.4).
Researcher Affiliation	Collaboration	1 Duke University 2 Fujitsu Laboratories Ltd. 3 The University of British Columbia
Pseudocode	Yes	Algorithm 1 Tree FARMS(x, y, λ, ϵ) Rset // Given a dataset (x, y), λ, and ϵ, return the set, Rset, of all trees whose objective is in θϵ. Algorithm 2 extract(G, sub, scope) (Detailed algorithm in Appendix B)
Open Source Code	Yes	Code Availability: Implementations of Tree FARMS is available at https://github.com/ubc-systopia/treeFarms.
Open Datasets	Yes	We use datasets from the UCI Machine Learning Repository [Car Evaluation, Congressional Voting Records, Monk2, and Iris, see 43], a penguin dataset [44], a criminal recidivism dataset [COMPAS, shared by 40], the Fair Isaac (FICO) credit risk dataset [45] used for the Explainable ML Challenge, and four coupon datasets (Bar, Coffee House, Cheap Restaurant, and Expensive Restaurant) [46] that come from surveys. More details are in Appendix F.
Dataset Splits	No	We denote the training dataset as {(xi, yi)}n i=1, where xi {0, 1}p are binary features.
Hardware Specification	No	The paper states that information on the type of resources used is available in Appendix F and G, but these appendices are not provided in the main paper. The main text does not include specific hardware details such as GPU/CPU models or memory.
Software Dependencies	No	We used the R package BART [37].
Experiment Setup	Yes	Figure 1: Comparison of trees in the Rashomon set (λ = 0.01, ϵ = 0.1) and trees generated by baselines. Figure 3: Variable Importance: Model class reliance on the COMPAS and Bar (λ = 0.01, ϵ = 0.05).