Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Empowering Decision Trees via Shape Function Branching
Authors: Nakul Upadhya, Eldan Cohen
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on various datasets show that SGTs achieve superior performance with reduced model size compared to traditional axis-aligned linear trees. 5 Experimental Evaluation |
| Researcher Affiliation | Academia | Nakul Upadhya, Eldan Cohen Department of Mechanical and Industrial Engineering University of Toronto, Toronto, Canada EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: Selecting Shape Function (Node Level Outer Problem) Algorithm 2: Fit Shape Function Algorithm 3: Shape CART (TDIDT Pseudocode) Algorithm 4: Coordinate Descent |
| Open Source Code | Yes | Source code for for Shape CART, Shape TAO, and its variants can be found at https://github.com/optimal-uoft/Empowering-DTs-via-Shape-Functions. |
| Open Datasets | Yes | We evaluate the performance of trees induced by Shape CART (SGT-C), Shape CART3 (SGT3-C), Shape2CART (S2GT-C) and Shape2CART3 (S2GT3-C) against various benchmark approaches on a range of 26 real-world classification datasets (details in Appendix E). We utilize open source datasets in our experiments and provide code to replicate our experiments. |
| Dataset Splits | Yes | Each dataset is split into three folds using a 70/30 train/validation test split, with the non-training data further divided in the same ratio. |
| Hardware Specification | Yes | All runs are executed on GCP N2 instances (8 v CPUs, 32 GB RAM). |
| Software Dependencies | No | For CART, we utilize the implementation found in the Scikit Learn [49]. For Axis-Aligned TAO and HSTree, we utilize the implementation found in the imodels [53] package. For SERDT, we utilize the implementation provided by the original authors [13] found at https://github.com/user-anonymous-researcher/interpretable-dts. For DPDT, we utilize the official implementation provided by the original authors [29] found at https://github.com/Kohler HECTOR/DPDTree Estimator. For SPLIT, we utilize the official implementation provided by the original authors found at https://github.com/Varun Babbar/SPLIT-ICML. The paper mentions various software packages and provides links or citations to them, but it does not explicitly state the version numbers of these software dependencies, such as specific versions of Scikit-learn or Python. |
| Experiment Setup | Yes | Hyperparameters are optimized via Bayesian search with Optuna [40] using 50 trials per model and depth to ensure fairness across differing search spaces. Each configuration is scored by mean validation accuracy, and we retain the best per depth and overall. Training time is limited to 15 minutes for univariate and 30 minutes for bivariate models. Appendix F.1 lists hyperparameter search spaces for all evaluated approaches and hyperparameter importance analyses for all Shape CART variants appear in Appendix F.1. |