Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Differentiable Decision Tree via "ReLU+Argmin" Reformulation

Authors: Qiangqiang Mao, Jiayang Ren, Yixiu Wang, Chenxuanyin Zou, Jingjing Zheng, Yankai Cao

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that our optimized tree achieves a superior testing accuracy against 14 baselines, including an average improvement of 7.54% over CART.
Researcher Affiliation	Academia	Qiangqiang Mao, Jiayang Ren, Yixiu Wang, Chenxuanyin Zou, Jingjing Zheng, Yankai Cao University of British Columbia, Vancouver, Canada. Corresponding author: EMAIL.
Pseudocode	Yes	Algorithm 1 The entire tree optimization framework for RADDT.
Open Source Code	Yes	The source code is available in https://github.com/Yankai Group/RADDT.
Open Datasets	Yes	Unless otherwise specified, these real-world datasets used for our regression experiments are collected from the UCI repository [Dua and Graff, 2019] and Open ML [Vanschoren et al., 2014].
Dataset Splits	Yes	In our experiments, typically, we allocate 75% of the samples for training purposes and the remaining 25% for testing, following the train-test split ratio as used in [Bertsimas and Dunn, 2017]. If an experiment requires cross validation for hyperparameter tuning like tree depth, we then subdivide the training datasets into training and validation subsets in a 2:1 ratio. The dataset setting accordingly changes to 50% samples as training set, 25% samples as validation set, and 25% samples as testing set.
Hardware Specification	Yes	Experiments necessitating CPU computation were executed on the HPC Cluster, specifically utilizing Dell EMC R440 CPU configuration. Each CPU job is allocated 32G memory with a Time Limit of 7 days. Experiments for ORT-MIP requiring larger memory resources were carried out on the Oracle HPC Cluster, specifically with 2T memory and 128 cores. Concurrently, experiments requiring GPU resources were conducted on the Narval server, with an NVIDIA A100 GPU equipped.
Software Dependencies	No	The paper mentions software like Py Torch, Scikit-learn, XGBoost, Julia, and Gurobi, but does not provide specific version numbers for these key components.
Experiment Setup	Yes	Our method is configured with Nepoch = 3, 000 and Nstart = 10, unless otherwise specified. ... This predetermined range is used to sample a set of scaled factors α for the strategy of multi-run warm start annealing. The principal aim is to explore a broader range of scale factors, ranging from smaller to larger values. ... In the implementation of our experiments, we simply sample α within the range [αmin, αmax] = [2, 200] to meet our requirements. ... The learning rate is a common parameter in gradient-based optimization, and has garnered significant attention in the literature. To simplify its usage, we adopt the well-established learning rate scheduler, termed Cosine Annealing Warm Restarts (with initial linear warm up) in Py Torch, which decreases the learning rate from an initial value of 0.01, thus minimizing the need for additional tuning.