Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Boulevard: Regularized Stochastic Gradient Boosted Trees and Their Limiting Distribution

Authors: Yichen Zhou, Giles Hooker

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	A simulation study and real world examples provide support for both the predictive accuracy of the model and its limiting behavior. Keywords: gradient boosting, regression tree, regularization, limiting distribution. ... We have conducted an empirical study to demonstrate the performance of Boulevard.
Researcher Affiliation	Academia	Yichen Zhou EMAIL Department of Statistics and Data Science Cornell University Ithaca, NY 14853, USA. Giles Hooker EMAIL Department of Statistics University of California, Berkeley Berkeley, CA 94720, USA.
Pseudocode	Yes	Algorithm 1 (Boulevard). ... Algorithm 2 (Trees for Non-adaptive Boosting). ... Algorithm 3 (Tail Snapshot Boulevard).
Open Source Code	Yes	The empirical study code is provided at: https://github.com/siriuz42/boulevard.git
Open Datasets	Yes	Results on four real world data sets selected from UCI Machine Learning Repository (Dheeru and Karra Taniskidou, 2017; T ufekci, 2014; Kaya et al., 2012) are shown in Figure 4.
Dataset Splits	Yes	All curves are averages after 5-fold cross validation. ... Figure 8 shows the result when we generate the 90% reproduction intervals for two real world datasets from UCI, namely CCPP and CASP. For each dataset, we take the ﬁrst 10 examples as test examples, and split the rest of the dataset into 11 folds.
Hardware Specification	No	The paper does not explicitly describe the specific hardware used to run its experiments. It mentions 'simulation study' and 'empirical study' but no details on CPU, GPU, or other computing resources.
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies or libraries used in the experiments. While it provides a link to code, the specific versions are not detailed within the paper itself.
Experiment Setup	Yes	Table 1: Parameters used in empirical study. label n θ ntree k λ MSE-(1-4) 5000 0.3 1000 20 0.8 MSE-Boston 506 0.8 1000 5 0.8 MSE-CCPP 9568 0.5 1000 50 0.8 MSE-CASP 20000 0.5 1000 50 0.8 MSE-Airfoil 1503 0.8 1000 40 0.8 Limiting-(1-4) 1000 0.8 2000 10 0.5 Variance-(1-4) 5000 0.8 3000 20 0.5 RI-(1-2) 1000 0.8 2000 10 0.5 RI-(3-4) 5000 0.8 2000 10 0.5