Unbiased Gradient Boosting Decision Tree with Unbiased Feature Importance

Authors: Zheyu Zhang, Tianping Zhang, Jian Li

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We assess the performance of Unbiased GBM and unbiased gain in a large-scale empirical study comprising 60 datasets and show that: 1) Unbiased GBM exhibits better performance than popular GBDT implementations such as Light GBM, XGBoost, and Catboost on average on the 60 datasets and 2) unbiased gain achieves better average performance in feature selection than popular feature importance methods.
Researcher Affiliation Academia Zheyu Zhang , Tianping Zhang and Jian Li Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University
Pseudocode No The paper describes methods verbally and through mathematical equations but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes The codes are available at https://github. com/Zheyu Aqa Zhang/Unbiased GBM.
Open Datasets Yes We collect 60 classification datasets in various application domains provided by Kaggle, UCI [Dua and Graff, 2017], and Open ML [Vanschoren et al., 2013] platforms.
Dataset Splits Yes Assume we divide the training set into a subtraining set D and two validation sets D 1 and D 2. [...] We experiment with different ratios of splitting the dataset and find out that we achieve the best performance when D = D 1 = D 2 (see more details in Appendix E).
Hardware Specification No The paper does not provide specific details regarding the hardware used for running the experiments.
Software Dependencies No The paper mentions 'Optuna [Akiba et al., 2019] Python package' but does not specify its version number or any other software dependencies with specific versions.
Experiment Setup Yes For each method, we perform hyperparameter optimization using the popular Optuna [Akiba et al., 2019] Python package. See more details in Appendix D. [...] We experiment with different ratios of splitting the dataset and find out that we achieve the best performance when D = D 1 = D 2 (see more details in Appendix E). [...] Unbiased GBM evaluates the generalization performance of each split and performs leaf-wise early-stopping to avoid overfitting splits. [...] our minimal gain to split is zero on a theoretic basis.