Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Optimal Survival Trees: A Dynamic Programming Approach

Authors: Tim Huisman, Jacobus G. M. van der Linden, Emir Demirović

AAAI 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experiments show that our method s run time even outperforms some heuristics for realistic cases while obtaining similar out-of-sample performance with the state-of-the-art. Our experiments show that Sur Tree s out-of-sample performance on average is better than CTree and similar to OST while outperforming OST in run time for realistic cases.
Researcher Affiliation Academia Tim Huisman, Jacobus G. M. van der Linden, Emir Demirovi c Delft University of Technology EMAIL, EMAIL
Pseudocode Yes Pseudocode is provided in the appendix.
Open Source Code Yes We implemented Sur Tree in C++ with a Python interface using the STree D framework (Van der Linden, De Weerdt, and Demirovi c 2023).1 https://github.com/Alg TUDelft/pystreed In our experiment setup,2 https://github.com/Tim Huisman1703/streed sa pipeline
Open Datasets Yes The real data sets are taken from the Surv Set repository (Drysdale 2022).
Dataset Splits Yes Each method is tuned using ten-fold cross-validation. We evaluate out-of-sample performance on the real data sets using five-fold cross-validation. The synthetic data is generated according to the procedure described in (Bertsimas et al. 2022)... each with a corresponding test set of 50,000 instances.
Hardware Specification Yes All experiments were run on an Intel i76600U CPU with 4GB RAM with a time-out of 10 minutes.
Software Dependencies No The paper mentions C++ and Python interface, Julia, and R implementations for different methods, but it does not specify version numbers for any of these software dependencies.
Experiment Setup Yes For Sur Tree, we tune the depth and node budget. For CTree, we tune the confidence criterion. For OST, we tune the depth and, simultaneously, OST automatically tunes the cost-complexity parameter as part of its training. We evaluate each method with a depth limit of four on five generated data sets for each combination of n {100, 200, 500, 1000, 2000, 5000} and c {0.1, 0.5, 0.8}, each with a corresponding test set of 50,000 instances.