Piecewise Constant and Linear Regression Trees: An Optimal Dynamic Programming Approach

Authors: Mim Van Den Bos, Jacobus G. M. Van Der Linden, Emir Demirović

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental results show that our methods improve scalability by one or more orders of magnitude over the state-of-the-art optimal methods while performing similarly or better in out-of-sample performance.
Researcher Affiliation Academia 1Department of Software Technology, Delft University of Technology, Delft, The Netherlands. Correspondence to: Jacobus G. M. van der Linden <J.G.M.vander Linden@tudelft.nl>.
Pseudocode Yes Algorithm 1: Depth-two tree search for a data set D and a feature set Fb. and Algorithm 2: Pseudo-code for SRT given a data set D, feature sets F and Fb, a depth budget d, a node budget n, an upper bound ub, a complexity cost λ and a minimum leaf node size M.
Open Source Code Yes We have implemented our methods in C++ using the STree D framework (Van der Linden et al., 2023) and provide it as a Python package.1 and 1https://github.com/Alg TUDelft/pystreed
Open Datasets Yes All data sets were obtained from the UCI Repository (Dua & Graff, 2017) and split into five folds.
Dataset Splits Yes All data sets were obtained from the UCI Repository (Dua & Graff, 2017) and split into five folds. For out-of-sample analysis, each method is tested on each of the five folds of the data sets with the other folds as training data. For CART, OSRT, SRT-C, SRT-SL, and SRT-L we hyper-tune the regularization parameter λ using five-fold cross-validation.
Hardware Specification Yes Hardware Experiments were run with one thread on an Intel Xeon E5-6248R 3.0GHz with 100GB RAM. Instead, we compared IAI with SRT-C separately on a local machine with an Intel i7-6600U CPU with 8GB RAM.
Software Dependencies Yes The MIP methods optimize the mean absolute error and are solved with Gurobi 9.5.
Experiment Setup Yes The piecewise constant and linear methods are trained with a maximum depth of five and four respectively. for normalized λ = 0.1, 0.01, 0.001, and 0.0001. We stop the computation after 10,000 iterations or when the change in SSE is less than 0.1%.