Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Scalable and Efficient Hypothesis Testing with Random Forests

Authors: Tim Coleman, Wei Peng, Lucas Mentch

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Simulations and applications to ecological data, where random forests have recently shown promise, are provided. ... In Section 4, we present simulation studies of the testing procedure for a variety of underlying regression functions, as well as a comparison with two diﬀerent knockoﬀstatistics. In Section 5, we apply our procedure to multiple ecological datasets where random forests have been successfully employed in recent applied work.
Researcher Affiliation	Academia	Tim Coleman EMAIL Wei Peng EMAIL Lucas Mentch EMAIL Department of Statistics University of Pittsburgh Pittsburgh, PA 15215, USA
Pseudocode	Yes	Algorithm 1: Permutation test pseudocode for variable importance
Open Source Code	No	The paper mentions using "random Forest package in R (Liaw and Wiener, 2002)" and "ranger package (Wright and Ziegler, 2015)" but these are third-party tools. There is no explicit statement or link indicating that the authors' own implementation code for the methodology described in the paper is made publicly available.
Open Datasets	Yes	Model 4 where the true data generating model is a random forest. We utilize a dataset from Coleman et al. (2017) ... Fish Toxicity We simulate X from the UCI ﬁsh toxicity data set provided by Cassotti et al. (2015) ... Forest Fires: Cortez and Morais (2007) sought to predict log(1+area) burned by several ﬁres in northern Portugal using covariate information on location, time of year, and local weather characteristics.
Dataset Splits	Yes	For each of our simulations, we train random forests using the random Forest package in R (Liaw and Wiener, 2002) using the default mtry parameters. ... In both settings, we draw n = 2000 points from the joint distribution of (X, Y ), subsample sizes of kn = n0.6 95, and build B = 125 trees in each forest. Predictions were made at Nt = 100 test points... For our procedure, we build 125 trees, holdout 90 observations at random for testing... Here we select 15% of the available observations ( 3800 points) uniformly at random to serve as the test set where the hypotheses will be evaluated.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types, memory amounts) used for running its experiments. It only discusses software, datasets, and experimental setup parameters.
Software Dependencies	No	We train random forests using the random Forest package in R (Liaw and Wiener, 2002) using the default mtry parameters. ... The random forests were trained with the ranger package using the default mtry = 4... The paper mentions specific software packages (random Forest package in R, ranger package) but does not provide version numbers for these packages or R itself.
Experiment Setup	Yes	For each of our simulations, we train random forests using the random Forest package in R (Liaw and Wiener, 2002) using the default mtry parameters. ... subsample sizes of kn = n0.6 95, and build B = 125 trees in each forest. Predictions were made at Nt = 100 test points... For Models 1 and 2, we focus on a marginal signal to noise ratio, which is controlled by the parameters β and σ. We fix β = 10 across all simulations let σ = 10/j where j takes 9 equally spaced values between 0.005 and 2.25... for Model 3, we let kn = n0.6 46, B = 125, Nt = 100, and vary the β coeﬃcient according to 8 equally spaced values between 0.01 and 2.5 and also for 7 equally spaced values between 5 and 20. In Model 4, we let n = 2000, kn = n0.6, B = 125, Nt = 100, and let σ = e j for 10 values of j equally spaced between 1 and 5. ... The random forests were trained with the ranger package using the default mtry = 4, subsamples of size kn = n0.6, and consisting of B = 250 trees in each. ... using mtry = 12 and kn = n0.6 43, B = 250 trees for the importance test and B = 500 trees for the overall test