Exploring the Whole Rashomon Set of Sparse Decision Trees
Authors: Rui Xin, Chudi Zhong, Zhi Chen, Takuya Takagi, Margo Seltzer, Cynthia Rudin
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our evaluation answers the following questions: 1. How does Tree FARMS compare to baseline methods for searching the hypothesis space? ( 6.1), 2. How quickly can we find the entire Rashomon set? ( 6.1), 3. What does the Rashomon set look like? What can we learn about its structure? ( G.2), 4. What does MCR look like for real datasets? ( 6.2), 5. How do balanced accuracy and F1-score Rashomon sets compare to the accuracy Rashomon set? ( 6.3), and 6. How does removing samples affect the Rashomon set? ( 6.4). |
| Researcher Affiliation | Collaboration | 1 Duke University 2 Fujitsu Laboratories Ltd. 3 The University of British Columbia |
| Pseudocode | Yes | Algorithm 1 Tree FARMS(x, y, λ, ϵ) Rset // Given a dataset (x, y), λ, and ϵ, return the set, Rset, of all trees whose objective is in θϵ. Algorithm 2 extract(G, sub, scope) (Detailed algorithm in Appendix B) |
| Open Source Code | Yes | Code Availability: Implementations of Tree FARMS is available at https://github.com/ubc-systopia/treeFarms. |
| Open Datasets | Yes | We use datasets from the UCI Machine Learning Repository [Car Evaluation, Congressional Voting Records, Monk2, and Iris, see 43], a penguin dataset [44], a criminal recidivism dataset [COMPAS, shared by 40], the Fair Isaac (FICO) credit risk dataset [45] used for the Explainable ML Challenge, and four coupon datasets (Bar, Coffee House, Cheap Restaurant, and Expensive Restaurant) [46] that come from surveys. More details are in Appendix F. |
| Dataset Splits | No | We denote the training dataset as {(xi, yi)}n i=1, where xi {0, 1}p are binary features. |
| Hardware Specification | No | The paper states that information on the type of resources used is available in Appendix F and G, but these appendices are not provided in the main paper. The main text does not include specific hardware details such as GPU/CPU models or memory. |
| Software Dependencies | No | We used the R package BART [37]. |
| Experiment Setup | Yes | Figure 1: Comparison of trees in the Rashomon set (λ = 0.01, ϵ = 0.1) and trees generated by baselines. Figure 3: Variable Importance: Model class reliance on the COMPAS and Bar (λ = 0.01, ϵ = 0.05). |