reproducibilityindex.ai

On Heterogeneous Treatment Effects in Heterogeneous Causal Graphs

Authors: Richard A Watson, Hengrui Cai, Xinming An, Samuel Mclean, Rui Song

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We test the ability of our method on estimating HCGs and HCEs in a variety of situations. The computing infrastructure used is a virtual machine at Google Colab with the Pro tier for the majority of the computations, and a cluster server with 30 processor cores with 15GB of random access memory for computing bootstrap estimates in parallel. We apply the proposed method to estimate the heterogeneous causal graphs and effects. The averaged estimated matrix b B over 100 replicates under different sample sizes with its true causal graph are summarized in Figure D.1 to D.6 in Appendix for S1 to S6 to evaluate the proposed method. The corresponding accuracy metrics, here we use the false discovery rate (FDR), the true positive rate (TPR), and the structural Hamming distance (SHD), with their standard errors are provided in Tables 1 and 2.
Researcher Affiliation	Academia	1Department of Statistics, North Carolina State University, USA 2Department of Statistics, University of California Irvine, USA 3Department of Anesthesiology, University of North Carolina Chapel Hill, USA. Correspondence to: Rui Song <songray@gmail.com>.
Pseudocode	Yes	The full algorithm is included in Appendix B.1, however, an abbreviated version can be found in Algorithm 1, with the time complexity provided in Appendix B.2. The extended functional ISL is given in Algorithm 2.
Open Source Code	Yes	Code implementing the proposed algorithm is open-source and publicly available at: https: //github.com/richard-watson/ISL.
Open Datasets	Yes	To demonstrate the practical usefulness of our method, we apply it to investigate the causal relationship of psychiatric disorders for trauma survivors using data collected from the AURORA study. The response of interest is post-traumatic stress disorder (PTSD) measured three months after trauma. With the launch of the Advancing Understanding of Rec Overy afte R traum A (AURORA) study [18], where thousands of trauma survivors were recruited after traumatic experiences and followed to collect a broad range of bio-behavioral data, discovering heterogeneous causal patterns thus becomes a timely issue.
Dataset Splits	No	The paper mentions sample sizes for simulation studies (e.g., "The sample size, n, is chosen from {500, 1000}"), and bootstrap resampling (e.g., "bootstrap simulation using 100 runs of 1000 bootstrap resamples"), but it does not provide explicit train/validation/test dataset splits or their proportions for reproducibility.
Hardware Specification	Yes	The computing infrastructure used is a virtual machine at Google Colab with the Pro tier for the majority of the computations, and a cluster server with 30 processor cores with 15GB of random access memory for computing bootstrap estimates in parallel.
Software Dependencies	No	The paper mentions using DAG-GNN and NOTEARS as comparison methods, and describes techniques like LASSO and augmented Lagrangian for optimization. However, it does not provide specific version numbers for any software dependencies, such as programming languages, libraries, or frameworks (e.g., Python version, PyTorch/TensorFlow version, scikit-learn version).
Experiment Setup	Yes	Scenario generation. We generate 8 scenarios. Scenario 1 (S1) is the simple case in which only X, A, and Y have non-zero weights; Scenario 2 (S2) has parallel mediators (B M = 0s s); and Scenario 3 (S3) has sequentially ordered mediators (B M = 0s s). The true causal graphs for S1 and S2 are randomly generated given structural constraints, while the causal graph for S3 is generated using the Erd os-Re nyi (ER) model with a degree of 2. The weights in graphs are randomly chosen from { 1, 0, 1}. Each of these scenarios sets p = 2 and s = 6 for a total of 12 nodes in the graph (including treatment, interaction, and outcome). In addition, we require that there is at least one non-zero interaction term to ensure that causal graphs are heterogeneous as discussed in Remark 3.1. However, we also study two alternative cases: (1) a variant of S3 in which there is no interaction term (S3nx) and (2) a variant of S3 in which X is a moderator as defined by Kraemer et al. [14] (S3mod), that is, X is independent of A and there is an interaction term. Scenario 4 (S4), 5 (S5), and 6 (S6) are higher-dimensional versions of S3, with s = 38 in S4, p = 18 in S5, and s = 10 plus p = 22 in S6, where the underlying graphs are generated by the ER model with a degree of 4. The rest settings are the same as in S3. The sample size, n, is chosen from {500, 1000}. For S4-S6, the sample size is fixed at n = 1000. We also set the noise to be standard gaussian for X, A, M, and Y . Given the noise and the edge weights, Model 1 can be directly used to generate the data by setting X as its noise and using X to generate A with the edge weights and associated noise. X and A can then be used with the edge weights and associated noise to generate M which is in turn used to generate Y in a similar manner. After this process is complete a baseline value of 1.0 is added to the response. Finally, the column-wise mean value for the dataset is subtracted from the dataset. The seeds used to generate the 100 datasets used for each simulation scenario are 1 through 100.