Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Statistical Inference for Gradient Boosting Regression
Authors: Haimo Fang, Kevin Tan, Giles Hooker
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments verify the asymptotic normality and demonstrate that our algorithms perform well, do not require early stopping, interpolate between regularized boosting and random forests, and confirm the validity of their built-in statistical inference procedures. ... The first, and most natural, thing to do is to examine the performance of our algorithms against a handful of competitors in terms of test MSE on nine datasets from the UCI Machine Learning Repository in Figure 3 and Figure 6 in the supplement. |
| Researcher Affiliation | Academia | Haimo Fang1, Kevin Tan2 , Giles Hooker2 1School of Economics, Fudan University 2Department of Statistics and Data Science, The Wharton School, University of Pennsylvania |
| Pseudocode | Yes | Algorithm 1 Boulevard Regularized Additive regression Trees Dropout (BRAT-D) ... Algorithm 2 BRAT Parallel (BRAT-P) |
| Open Source Code | Yes | Code available at https://github.com/Fangbaixiangmomo/BRATs.git |
| Open Datasets | Yes | The first, and most natural, thing to do is to examine the performance of our algorithms against a handful of competitors in terms of test MSE on nine datasets from the UCI Machine Learning Repository... Nash, W., Sellers, T., Talbot, S., Cawthorn, A., and Ford, W. (1994). Abalone. UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C55C7W. |
| Dataset Splits | Yes | We establish the efficacy of our methods on both a simulation study and real-world datasets. Our numerical experiments showcase that our algorithms can be tuned to interpolate between regularized boosting and random forests, and are highly competitive in terms of MSE. These also demonstrate the correctness, coverage, and computational efficiency of the statistical procedures we construct via the central limit theorems our algorithms enjoy. ... Consider a split of the training dataset (Xn, yn) into (Xn,1, yn,1), (Xn,2, yn,2), where Xn,1 Rn/2,d, Xn,2 Rn/2, qd. ... Test set is of the same size. Error rates computed over 30 trials. (Figure 3, Left) |
| Hardware Specification | Yes | All experiments were run on an Apple Mac Book Pro (2022, Apple M2, 8 GB RAM); individual runs took under 2 hours and the full suite under 6 hours. |
| Software Dependencies | No | The paper mentions "XGBoost Chen and Guestrin (2016), Light GBM Ke et al. (2017), and Cat Boost Prokhorenkova et al. (2018)" as methods, but does not provide specific version numbers for these or any other software libraries or programming languages used in their implementation or experimental setup within the main text. |
| Experiment Setup | Yes | Figure 1: Demonstration of confidence, prediction, and reproduction intervals on f(x) = sin 2πx+ 1 2x2. Conformal baseline. 200 trees, learning rate 0.6, depth 8, subsampling and dropout 0.6. ... For variable importance tests: We fit Algorithm 1 on data generated from f(x) and g(x) = 4x1 x2 2, with 100 trees, λ = 1, subsampling rate 1, dropout rate 0.95, and a max depth of 6. ... All hyperparameters were tuned with Optuna Akiba et al. (2019), reported in Appendix I. |