reproducibilityindex.ai

Gradient Boosting with Piece-Wise Linear Regression Trees

Authors: Yu Shi, Jian Li, Zhize Li

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experimental results show that GBDT with PL Trees can provide very competitive testing accuracy with comparable or less training time.
Researcher Affiliation	Academia	Yu Shi , Jian Li and Zhize Li Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
Pseudocode	Yes	Algorithm 1 Training Process of PL Tree
Open Source Code	Yes	Our code, details of experiment setting and datasets is available at the github page. 1 https://github.com/GBDT-PL/GBDT-PL.git
Open Datasets	Yes	Our code, details of experiment setting and datasets is available at the github page. 1 https://github.com/GBDT-PL/GBDT-PL.git
Dataset Splits	Yes	For GBDT-PL, we seperate 20% of training data for validation, and pick the best setting on validation set, then record the corresponding accuracy on test set.
Hardware Specification	No	The paper mentions "modern computer architectures with powerful Single Instruction Multiple Data (SIMD) parallelism" and "Training Time on CPU" but does not specify exact CPU or GPU models, or other hardware components used for experiments.
Software Dependencies	No	The paper mentions "Intel MKL [Wang et al., 2014]" but does not provide a specific version number. No other software dependencies with version numbers are listed.
Experiment Setup	Yes	Key hyperparameters we tuned include: 1. num leaves {16, 64, 256, 1024}, which controls the size of each tree. For Cat Boost with Symmetric Tree mode, the tree is grown by level, so max depth {4, 6, 8, 10} is used instead of num leaves. 2. max bin {63, 255}, the maximum number of bins in histograms. 3. min sum hessians {1.0, 100.0}, the sum of hessians of data in each leaf. 4. learning rate {0.01, 0.05, 0.1}, the weight of each tree. 5. l2 reg {0.01, 10.0}, l2 regularization for leaf predicted values. We ﬁx the number of regressors used in GBDT-PL to 5 in all runs.