Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Decorrelated Variable Importance

Authors: Isabella Verdinelli, Larry Wasserman

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Section 4 contains some simulation studies. ... The results from 100 simulations are summarized in Figures 2 and 3 and in Table 2. The standard error of the coverage is 0.03. Figure 2 shows how often the confidence interval contains the target parameter ψ0 as a function of the correlation which varies from 0 to 1.
Researcher Affiliation Academia Isabella Verdinelli EMAIL Department of Statistics Carnegie Mellon University 5000 Forbes Ave. Pittsburgh, PA 15213, USA. Larry Wasserman EMAIL Department of Statistics Carnegie Mellon University 5000 Forbes Ave. Pittsburgh, PA 15213, USA.
Pseudocode No No explicit pseudocode or algorithm blocks are provided. The methodology is described through mathematical equations and textual explanations.
Open Source Code No No concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) is provided for the methodology described in this paper.
Open Datasets No In this section, we compare the behavior of the different parameters in some synthetic examples. ... Example 1. We start with a very simple scenario where Y = 2X + ϵ, ϵ N(0, 1), Z1 = δX + ξ, ξ N(0, 1), and (Z2, . . . , Z5) N(0, I). ... Examples 2-5. Now we consider four multivariate examples. In each case, n = 10, 000, h = 5 and ϵ N(0, 1).
Dataset Splits No No specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning is provided. The paper describes generating synthetic data for its examples.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No For the additive models we use the R package mgcv. For random forests we use the R package grf.
Experiment Setup Yes For the additive models we use the R package mgcv. For random forests we use the R package grf. We always use the default settings making no attempt to tune the methods to achieve good coverage. ... In each case, n = 10, 000, h = 5 and ϵ N(0, 1).