reproducibilityindex.ai

Unbiased Gradient Boosting Decision Tree with Unbiased Feature Importance

Authors: Zheyu Zhang, Tianping Zhang, Jian Li

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We assess the performance of Unbiased GBM and unbiased gain in a large-scale empirical study comprising 60 datasets and show that: 1) Unbiased GBM exhibits better performance than popular GBDT implementations such as Light GBM, XGBoost, and Catboost on average on the 60 datasets and 2) unbiased gain achieves better average performance in feature selection than popular feature importance methods.
Researcher Affiliation	Academia	Zheyu Zhang , Tianping Zhang and Jian Li Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University
Pseudocode	No	The paper describes methods verbally and through mathematical equations but does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	The codes are available at https://github. com/Zheyu Aqa Zhang/Unbiased GBM.
Open Datasets	Yes	We collect 60 classification datasets in various application domains provided by Kaggle, UCI [Dua and Graff, 2017], and Open ML [Vanschoren et al., 2013] platforms.
Dataset Splits	Yes	Assume we divide the training set into a subtraining set D and two validation sets D 1 and D 2. [...] We experiment with different ratios of splitting the dataset and find out that we achieve the best performance when D = D 1 = D 2 (see more details in Appendix E).
Hardware Specification	No	The paper does not provide specific details regarding the hardware used for running the experiments.
Software Dependencies	No	The paper mentions 'Optuna [Akiba et al., 2019] Python package' but does not specify its version number or any other software dependencies with specific versions.
Experiment Setup	Yes	For each method, we perform hyperparameter optimization using the popular Optuna [Akiba et al., 2019] Python package. See more details in Appendix D. [...] We experiment with different ratios of splitting the dataset and find out that we achieve the best performance when D = D 1 = D 2 (see more details in Appendix E). [...] Unbiased GBM evaluates the generalization performance of each split and performs leaf-wise early-stopping to avoid overfitting splits. [...] our minimal gain to split is zero on a theoretic basis.