Extrapolated Random Tree for Regression

Authors: Yuchao Cai, Yuheng Ma, Yiwei Dong, Hanfang Yang

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In the experiments, we compare ERTR with state-of-the-art tree algorithms on real datasets to show the superior performance of our model.
Researcher Affiliation Academia 1School of Statistics, Renmin University of China 2Center for Applied Statistics, School of Statistics, Renmin University of China. Correspondence to: Hanfang Yang <hyang@ruc.edu.cn>.
Pseudocode Yes Algorithm 1 Random Tree Partition; Algorithm 2 Extrapolated Random Tree for Regression
Open Source Code Yes All code is available on Git Hub1. [Footnote 1: https://github.com/Karlmyh/ERTR]
Open Datasets Yes ABA: The Abalone dataset originally comes from biological research (Nash et al., 1994) and now it is accessible on UCI Machine Learning Repository (Dua & Graff, 2017). AIR: The Airfoil Self-Noise dataset on UCI Machine Learning Repository... ALG: The Algerian Forest Fires dataset on UCI Machine Learning Repository...
Dataset Splits Yes For each pair of (p, L), we set λ = 10 4 as the regularized parameter for ridge regression and choose V {15, 20, 25} by cross-validation. ... We take 30% of the training data as the validation set.
Hardware Specification Yes All experiments are conducted on a machine with 72-core Intel Xeon 2.60GHz and 128GB main memory.
Software Dependencies No For standard decision trees, we use the implementation by Scikit-Learn (Pedregosa et al., 2011). We use the implementation in C++2. ... We use the implementation in R3. ... We use the implementation in Python 4. The paper mentions software such as Scikit-Learn, C++, R, and Python, but does not specify their version numbers or the version numbers of specific libraries/packages beyond citing their original papers.
Experiment Setup Yes For ERTR, we use the parameter grids p {2, 3, 4, 5, 6, 7, 8}, C {0, 1} and λ {0.001, 0.01, 0.1}. V is fixed to be max( n 2 (p+2) , 5). For each node, if the number of samples in the node is less than 5, then we stop the recursive partition process of the current node. For ERF, we set the number of trees to 200 and subsample { 0.5d , 0.75d , d} features in each split procedure to look for the best cut. In addition, each base learner is trained on a { 0.8n , n, 1.2n } samples bootstrapped with replacement from D. For GBERTR, we set the number of trees to 100 and the learning rate to 0.01.