Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Decorrelated Variable Importance

Authors: Isabella Verdinelli, Larry Wasserman

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Section 4 contains some simulation studies. ... The results from 100 simulations are summarized in Figures 2 and 3 and in Table 2. The standard error of the coverage is 0.03. Figure 2 shows how often the conﬁdence interval contains the target parameter ψ0 as a function of the correlation which varies from 0 to 1.
Researcher Affiliation	Academia	Isabella Verdinelli EMAIL Department of Statistics Carnegie Mellon University 5000 Forbes Ave. Pittsburgh, PA 15213, USA. Larry Wasserman EMAIL Department of Statistics Carnegie Mellon University 5000 Forbes Ave. Pittsburgh, PA 15213, USA.
Pseudocode	No	No explicit pseudocode or algorithm blocks are provided. The methodology is described through mathematical equations and textual explanations.
Open Source Code	No	No concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) is provided for the methodology described in this paper.
Open Datasets	No	In this section, we compare the behavior of the diﬀerent parameters in some synthetic examples. ... Example 1. We start with a very simple scenario where Y = 2X + ϵ, ϵ N(0, 1), Z1 = δX + ξ, ξ N(0, 1), and (Z2, . . . , Z5) N(0, I). ... Examples 2-5. Now we consider four multivariate examples. In each case, n = 10, 000, h = 5 and ϵ N(0, 1).
Dataset Splits	No	No specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning is provided. The paper describes generating synthetic data for its examples.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	For the additive models we use the R package mgcv. For random forests we use the R package grf.
Experiment Setup	Yes	For the additive models we use the R package mgcv. For random forests we use the R package grf. We always use the default settings making no attempt to tune the methods to achieve good coverage. ... In each case, n = 10, 000, h = 5 and ϵ N(0, 1).