Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Copula-based Sensitivity Analysis for Multi-Treatment Causal Inference with Unobserved Confounding

Authors: Jiajing Zheng, Alexander D'Amour, Alexander Franks

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we demonstrate our sensitivity analysis workﬂow in several numerical simulations. The goal of these simulations is twofold: ﬁrst, to demonstrate some of the operating characteristics of the approach in settings that are more realistic than the linear Gaussian settings we characterized analytically; and secondly, to show how exploratory tools like calibration, robustness analysis, and MCCs can be used to draw conclusions and choose interesting candidate models. We consider two broad simulation settings. In the ﬁrst setting, we construct simulations with non-linear responses to treatment to show how the ignorance regions returned by our method can vary in diﬀerent scenarios. In the second setting, we construct a simulation that mimics the structure of a Genome Wide Association Study (GWAS). Here, we examine the behavior of our method when a popular approximate latent variable method the Variational Auto Encoder (VAE) is used to estimate the eﬀects of latent confounders, and demonstrate how MCCs can be useful tools for using prior information to choose potentially useful causal models from the set that is compatible with the observed data. In both subsections, we simulate data from the following generating process: U := ϵu, ϵu N(0, I), (43) T := h T (BU + ϵt), ϵt N(0, σ2 t I) (44) Y := h Y \|T (g(T) + γ U + ϵy), ϵy N(0, σ2) (45) The functions h Y \|T and h T are chosen according to be either the identity for Gaussian data, or an indicator function for binary data.
Researcher Affiliation	Collaboration	Jiajing Zheng jiajing EMAIL Department of Statistics and Applied Probability University of California Santa Barbara, CA 93101, U.S.A. Alexander D Amour EMAIL Google Research Boston, Massachusetts, U.S.A. Alexander Franks EMAIL Department of Statistics and Applied Probability University of California Santa Barbara, CA 93101, U.S.A.
Pseudocode	Yes	Appendix A.1 General Contrast Estimation Algorithm Algorithm 1 Marginal Contrast Estimation for Arbitrary Copulas 1: function Compute Mean(t, ψ) 2: for k = 1, 2, . . . , M do ... Appendix A.2 Contrast Estimation Algorithm with Gaussian Copulas Algorithm 2 Marginal Contrast Estimation with Gaussian Copulas. 1: function Compute Mean(t, γ) 2: for i = 1, 2, . . . , n do ...
Open Source Code	Yes	Code to replicate all analyses is available (Zheng, 2021b) and an R package implementing our methodology is also available and in active development (Zheng, 2021a).
Open Datasets	Yes	In this section, we apply our sensitivity analysis to mice obesity data generated by Wang et al. (2006) and Ghazalpour et al. (2006), and compiled into a single data set by Lin et al. (2015). ... In addition to the GWAS simulation and gene expression data set analyzed in this paper, we also include a reanalysis of the TMDB 5000 Movie Data Set (Kaggle, 2017) in Appendix F.
Dataset Splits	No	The paper does not explicitly provide training/test/validation dataset splits. It describes data generation for simulations and subsetting for the movie dataset (e.g., 'we subset the data to the k = 327 actors who participated in at least twenty movies. This reduces the total number of movies to 2439.') but not standard splits for model evaluation.
Hardware Specification	No	The paper does not provide any specific hardware details used for running its experiments.
Software Dependencies	No	For Gaussian outcomes, the width of the ignorance regions are larger for the treatments most correlated with confounders as characterized in Corollary 2.1 (see Figure 2). Since B is a vector, the width of the ignorance region of PATEt1,t2 can be examined by looking at the dot product between B and the treatment contrasts. The larger the dot product, the wider the ignorance region. As expected, the ignorance region of the treatment eﬀect is widest when t1 = e1 (RV 0%) and narrowest when t1 = e4, since B e1 has the largest magnitude while B e4 has the smallest. Despite the fact that t1 = e4 has the smallest ignorance region, it is not robust to confounding because the naive eﬀect is already close to zero (RV = 9%). For the second and third treatment contrasts, estimates are robust to confounders, as their entire ignorance regions exclude 0. These results require the Gaussian copula assumption (Assumption 6), but in Appendix D, we show via simulation that alternative choices for the copula yield results that lie within the worst-case Gaussian bounds for R2 Y U\|T = 1. In Appendix Figure 9, we include the causal eﬀects implied by some Archimedean copulas as well as an example with a non-monotone copula (e.g. quadratic relationship between U and Y ). Thus, while the Gaussian copula will not hold exactly in practice, it is likely a that the Gaussian bounds cover the true causal eﬀect when the true copula is non-Gaussian. For the simulation with binary outcomes, we compute ignorance regions for the risk ratio. Although we do not have a theoretical result about the ignorance regions of the risk ratio, the general trends in the size of the ignorance region and the robustness of eﬀects are comparable to the Gaussian. Most notably, the treatments with the largest ignorance regions are still those that are most correlated with the confounder. On the other hand, because the outcome is non-linear in U, the naive estimate is not at the center of the ignorance region (Figure 2b). In fact, the ignorance region is also non-monotone in R2 ZY U\|T because the variance of the intervention distribution also depends on γ. In this case, one of the endpoints of the ignorance region corresponds to R2 ZY U\|T = 1 but the other does not. We compute the endpoints of the ignorance region numerically (see Appendix C.3 for more details). ... To represent the possible relationship between treatments and confounders we ﬁt a linear factor model, which is commonly used to characterize the unmeasured confounding in gene expression studies (Gagnon-Bartsch and Speed, 2012), using the factanal method. From the scree plot of the singular values of the gene expression matrix, we ﬁnd that there are two singular values which exceed the rest, which suggests that an m = 2 confounder model is a reasonable choice (Appendix Figure 13). We then ﬁt a Bayesian linear regression model of mouse weight on gene expression levels using the default prior distributions from the rstanarm package (Goodrich et al., 2020).
Experiment Setup	No	The paper mentions using specific models and packages like BART and rstanarm, and notes using 'default prior distributions' for rstanarm. It describes some aspects of the VAE fitting such as cross-validation to identify the latent dimension. However, it does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed training configurations (e.g., optimizer settings, model initialization) that would allow for precise replication of the experimental setup.