Optimal Sparse Decision Trees
Authors: Xiyang Hu, Cynthia Rudin, Margo Seltzer
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test our method on benchmark data sets, as well as criminal recidivism and credit risk data sets; these are two of the high-stakes decision problems where interpretability is needed most in AI systems. We provide ablation experiments to show which of our techniques is most influential at reducing computation for various datasets. ... We used 7 datasets: Five of them are from the UCI Machine Learning Repository [8], (Tic Tac Toe, Car Evaluation, Monk1, Monk2, Monk3). The other two datasets are the Pro Publica recidivism data set [12] and the Fair Isaac (FICO) credit risk dataset [9]. |
| Researcher Affiliation | Academia | Xiyang Hu1, Cynthia Rudin2, Margo Seltzer3 1Carnegie Mellon University, xiyanghu@cmu.edu 2Duke University, cynthia@cs.duke.edu 3The University of British Columbia, mseltzer@cs.ubc.ca |
| Pseudocode | Yes | We illustrate our framework in Algorithm 1 in Supplement A. |
| Open Source Code | Yes | The code and the supplementary materials are available at https://github.com/xiyanghu/OSDT. |
| Open Datasets | Yes | We used 7 datasets: Five of them are from the UCI Machine Learning Repository [8], (Tic Tac Toe, Car Evaluation, Monk1, Monk2, Monk3). The other two datasets are the Pro Publica recidivism data set [12] and the Fair Isaac (FICO) credit risk dataset [9]. |
| Dataset Splits | No | The paper mentions 'training data' and 'cross-validation experiments' in the conclusion, but does not provide specific details on dataset splits (percentages, counts, or explicit splitting methodology) for training, validation, and testing. |
| Hardware Specification | Yes | The results of the per-bound performance and memory improvement experiment (Table 2 in the supplement) were run on a m5a.4xlarge instance of AWS s Elastic Compute Cloud (EC2). The instance has 16 2.5GHz virtual CPUs (although we run single-threaded on a single core) and 64 GB of RAM. All other results were run on a personal laptop with a 2.4GHz i5-8259U processor and 16GB of RAM. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., 'Python 3.x', 'PyTorch 1.x', 'scikit-learn x.x'). |
| Experiment Setup | Yes | The time limits for both Bin OCT and our algorithm are set to be 30 minutes. (Figure 2) Example OSDT execution traces (COMPAS data, λ = 0.005). |