reproducibilityindex.ai

Smaller, more accurate regression forests using tree alternating optimization

Authors: Arman Zharmagambetov, Miguel Carreira-Perpinan

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In a wide range of datasets, we show that the resulting forests exceed the accuracy of state-of-the-art algorithms such as random forests, Ada Boost or gradient boosting, often considerably, while yielding forests that have usually fewer and shallower trees and hence fewer parameters and faster inference overall.
Researcher Affiliation	Academia	1Dept. of Computer Science & Engineering, University of California, Merced, USA. Correspondence to: Arman Zharmagambetov <azharmagambetov@ucmerced.edu>, Miguel A. Carreira-Perpi n an <mcarreira-perpinan@ucmerced.edu>.
Pseudocode	Yes	Algorithm 1 TAO regression tree algorithm (BFS order) input: training set; initial tree T( ; Θ) of depth N0, . . . , N nodes at depth 0, . . . , , respectively R1 {1, . . ., N} repeat for d = 0 to do parfor i Nd do if i is a leaf then θi train regressor gi on reduced set Ri else θi train decision function fi on Ri compute the reduced sets of each child of i end if end parfor end for until stop prune dead subtrees of T return T
Open Source Code	No	The paper states "We implemented TAO in Python" and mentions a C implementation, but does not provide any concrete access information (e.g., a specific repository link or an explicit code release statement) for the source code.
Open Datasets	Yes	Datasets: abalone, ailerons, cpuact, CT slice; for each, we give (N, D, K) = sample size and input and output dimensionality. [...] We compare TAO with the state-of-the-art tree ensembling algorithms: Random Forests (RF) (Breiman, 2001), Extra Trees (ET) (Geurts et al., 2006), Ada Boost (Freund & Schapire, 1997) (all using the Python scikit-learn implementation; Pedregosa et al., 2011); and gradient boosting (Friedman, 2001) (using the highly optimized XGBoost implementation; Chen & Guestrin, 2016). [...] Tables 1-3 provide details for "abalone", "ailerons", "cpuact", "CT slice", "Year Prediction MSD", "SARCOS", "MNIST".
Dataset Splits	No	The paper mentions training each tree on a "90% random sample of the training data" and that hyperparameters "could be determined by cross-validation," but it does not explicitly state the specific training/validation/test splits or cross-validation setup used for the overall model evaluation in a reproducible manner.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used to run its experiments.
Software Dependencies	No	The paper mentions using "Python scikit-learn implementation" and "XGBoost implementation" for baseline comparisons, and "LIBLINEAR" for solving logistic regression, but it does not provide specific version numbers for these software components.
Experiment Setup	Yes	As for TAO, we train each tree on a 90% random sample of the training data using 40 iterations. [...] We initialize each TAO tree from a complete tree of depth and random node parameters (each node s weight vector has Gaussian (0,1) entries, and then we normalize the vector to unit length). [...] We train each tree with an ℓ1 regularizer but set its hyperparameter α to a small value (0.01). [...] Most importantly, the forest size should be as big as possible (depth , number of trees T ) but also avoiding overﬁtting; practically, and T could be determined by cross-validation.