Gradient Boosting with Piece-Wise Linear Regression Trees

Authors: Yu Shi, Jian Li, Zhize Li

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental results show that GBDT with PL Trees can provide very competitive testing accuracy with comparable or less training time.
Researcher Affiliation Academia Yu Shi , Jian Li and Zhize Li Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
Pseudocode Yes Algorithm 1 Training Process of PL Tree
Open Source Code Yes Our code, details of experiment setting and datasets is available at the github page. 1 https://github.com/GBDT-PL/GBDT-PL.git
Open Datasets Yes Our code, details of experiment setting and datasets is available at the github page. 1 https://github.com/GBDT-PL/GBDT-PL.git
Dataset Splits Yes For GBDT-PL, we seperate 20% of training data for validation, and pick the best setting on validation set, then record the corresponding accuracy on test set.
Hardware Specification No The paper mentions "modern computer architectures with powerful Single Instruction Multiple Data (SIMD) parallelism" and "Training Time on CPU" but does not specify exact CPU or GPU models, or other hardware components used for experiments.
Software Dependencies No The paper mentions "Intel MKL [Wang et al., 2014]" but does not provide a specific version number. No other software dependencies with version numbers are listed.
Experiment Setup Yes Key hyperparameters we tuned include: 1. num leaves {16, 64, 256, 1024}, which controls the size of each tree. For Cat Boost with Symmetric Tree mode, the tree is grown by level, so max depth {4, 6, 8, 10} is used instead of num leaves. 2. max bin {63, 255}, the maximum number of bins in histograms. 3. min sum hessians {1.0, 100.0}, the sum of hessians of data in each leaf. 4. learning rate {0.01, 0.05, 0.1}, the weight of each tree. 5. l2 reg {0.01, 10.0}, l2 regularization for leaf predicted values. We fix the number of regressors used in GBDT-PL to 5 in all runs.