Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

The Art of BART: Minimax Optimality over Nonhomogeneous Smoothness in High Dimension

Authors: Seonghyun Jeong, Veronika Rockova

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our numerical study shows that Bayesian forests often outperform other competitors such as random forests and deep neural networks, which are believed to work well for discontinuous or complicated smooth functions.
Researcher Affiliation	Academia	Seonghyun Jeong EMAIL Department of Statistics and Data Science Department of Applied Statistics Yonsei University Seoul 03722, Republic of Korea Veronika Roˇckov a EMAIL Booth School of Business University of Chicago Chicago, IL 60637, USA
Pseudocode	No	The paper describes methods and mathematical procedures but does not contain any clearly labeled pseudocode or algorithm blocks. The procedures are conveyed through explanatory text and mathematical formulations.
Open Source Code	No	The text mentions the use of existing R packages like "R package BART," "gbm package," and "randomForest package," as well as "TensorFlow with the Keras interface." It also states, "We are grateful to ... Qurie Moon for sharing the code for BART with the exponentially decaying prior distribution." This indicates that a third party shared code, not that the authors are providing their own implementation code for the methodology described in this paper.
Open Datasets	No	Our synthetic datasets are generated from model (6) with a few diﬀerent functions f0 : [0, 1]p R.
Dataset Splits	No	Figures 11 and 12 show the root mean squared prediction error (RMSPE) obtained by the methods described in Table 1. The RMSPEs are estimated by randomly drawn out-of-samples. For each scenario, we consider two sample sizes n {1000, 5000} and ﬁve dimension values p {2, 5, 10, 20, 50}, while ﬁxing σ2 0 = 0.52 for reasonable signal to noise ratios. The paper mentions generating synthetic datasets and drawing 'out-of-samples' for evaluation, but it does not specify explicit dataset splits such as percentages for training, validation, and testing, nor does it provide a detailed methodology for how these splits were performed (e.g., random seed, k-fold cross-validation).
Hardware Specification	No	The paper describes the numerical study and the models used but does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory, or cloud computing resources with specifications) on which the experiments were run.
Software Dependencies	No	We ﬁt BART with 200 trees using the prior that splits a node at depth ℓwith probability α(ℓ+ 1) β for α (0, 1) and β [0, ], the original construction by Chipman et al. (2010), which is implemented in the R package BART. GB is trained by the gbm package with trees of ﬁve splits and the number of trees determined via cross validation (CV). RF is ﬁtted by the random Forest package with 200 trees and the maximal node size 5 or 50 for each tree. The NN models are trained by Tensor Flow with the Keras interface. The paper mentions several software packages (R package BART, gbm, randomForest, TensorFlow, Keras) but does not provide specific version numbers for any of these dependencies.
Experiment Setup	Yes	We ﬁt BART with 200 trees using the prior that splits a node at depth ℓwith probability α(ℓ+ 1) β for α (0, 1) and β [0, ], the original construction by Chipman et al. (2010), which is implemented in the R package BART. However, as our theory resorts to the exponentially decaying prior for splits as mentioned in Section 3.1, we also consider BART with the prior that splits a node at depth ℓwith probability νℓ+1 for ν (0, 1/2). We choose α = 0.3, β = 2, and ν = 0.3 to make the two priors roughly similar for small ℓ. For GP prior regression, the squared exponential covariance kernel k(x, x ) = τ 2 exp( x x 2/l2) is employed with half normal priors τ N+(0, 1) and l N+(0, 1). GB is trained by the gbm package with trees of ﬁve splits and the number of trees determined via cross validation (CV). RF is ﬁtted by the random Forest package with 200 trees and the maximal node size 5 or 50 for each tree. The NN models are trained by Tensor Flow with the Keras interface. We consider two NN models with two and four hidden layers with (64, 32) and (256, 128, 64, 32) hidden units. All hidden units take the Re LU activation function with the dropout of rate 0.3 for regularization.