Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Exploring the Whole Rashomon Set of Sparse Decision Trees

Authors: Rui Xin, Chudi Zhong, Zhi Chen, Takuya Takagi, Margo Seltzer, Cynthia Rudin

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our evaluation answers the following questions: 1. How does Tree FARMS compare to baseline methods for searching the hypothesis space? ( 6.1), 2. How quickly can we find the entire Rashomon set? ( 6.1), 3. What does the Rashomon set look like? What can we learn about its structure? ( G.2), 4. What does MCR look like for real datasets? ( 6.2), 5. How do balanced accuracy and F1-score Rashomon sets compare to the accuracy Rashomon set? ( 6.3), and 6. How does removing samples affect the Rashomon set? ( 6.4).
Researcher Affiliation Collaboration 1 Duke University 2 Fujitsu Laboratories Ltd. 3 The University of British Columbia
Pseudocode Yes Algorithm 1 Tree FARMS(x, y, λ, ϵ) Rset // Given a dataset (x, y), λ, and ϵ, return the set, Rset, of all trees whose objective is in θϵ. Algorithm 2 extract(G, sub, scope) (Detailed algorithm in Appendix B)
Open Source Code Yes Code Availability: Implementations of Tree FARMS is available at https://github.com/ubc-systopia/treeFarms.
Open Datasets Yes We use datasets from the UCI Machine Learning Repository [Car Evaluation, Congressional Voting Records, Monk2, and Iris, see 43], a penguin dataset [44], a criminal recidivism dataset [COMPAS, shared by 40], the Fair Isaac (FICO) credit risk dataset [45] used for the Explainable ML Challenge, and four coupon datasets (Bar, Coffee House, Cheap Restaurant, and Expensive Restaurant) [46] that come from surveys. More details are in Appendix F.
Dataset Splits No We denote the training dataset as {(xi, yi)}n i=1, where xi {0, 1}p are binary features.
Hardware Specification No The paper states that information on the type of resources used is available in Appendix F and G, but these appendices are not provided in the main paper. The main text does not include specific hardware details such as GPU/CPU models or memory.
Software Dependencies No We used the R package BART [37].
Experiment Setup Yes Figure 1: Comparison of trees in the Rashomon set (λ = 0.01, ϵ = 0.1) and trees generated by baselines. Figure 3: Variable Importance: Model class reliance on the COMPAS and Bar (λ = 0.01, ϵ = 0.05).