Unbiased Gradient Boosting Decision Tree with Unbiased Feature Importance
Authors: Zheyu Zhang, Tianping Zhang, Jian Li
IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We assess the performance of Unbiased GBM and unbiased gain in a large-scale empirical study comprising 60 datasets and show that: 1) Unbiased GBM exhibits better performance than popular GBDT implementations such as Light GBM, XGBoost, and Catboost on average on the 60 datasets and 2) unbiased gain achieves better average performance in feature selection than popular feature importance methods. |
| Researcher Affiliation | Academia | Zheyu Zhang , Tianping Zhang and Jian Li Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University |
| Pseudocode | No | The paper describes methods verbally and through mathematical equations but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The codes are available at https://github. com/Zheyu Aqa Zhang/Unbiased GBM. |
| Open Datasets | Yes | We collect 60 classification datasets in various application domains provided by Kaggle, UCI [Dua and Graff, 2017], and Open ML [Vanschoren et al., 2013] platforms. |
| Dataset Splits | Yes | Assume we divide the training set into a subtraining set D and two validation sets D 1 and D 2. [...] We experiment with different ratios of splitting the dataset and find out that we achieve the best performance when D = D 1 = D 2 (see more details in Appendix E). |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware used for running the experiments. |
| Software Dependencies | No | The paper mentions 'Optuna [Akiba et al., 2019] Python package' but does not specify its version number or any other software dependencies with specific versions. |
| Experiment Setup | Yes | For each method, we perform hyperparameter optimization using the popular Optuna [Akiba et al., 2019] Python package. See more details in Appendix D. [...] We experiment with different ratios of splitting the dataset and find out that we achieve the best performance when D = D 1 = D 2 (see more details in Appendix E). [...] Unbiased GBM evaluates the generalization performance of each split and performs leaf-wise early-stopping to avoid overfitting splits. [...] our minimal gain to split is zero on a theoretic basis. |