reproducibilityindex.ai

Permutation-based Causal Inference Algorithms with Interventions

Authors: Yuhao Wang, Liam Solus, Karren Yang, Caroline Uhler

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we present these two algorithms and their consistency guarantees, and we analyze their performance on simulated data, protein signaling data, and single-cell gene expression data.
Researcher Affiliation	Academia	Yuhao Wang Laboratory for Information and Decision Systems and Institute for Data, Systems and Society Massachusetts Institute of Technology Cambridge, MA 02139 yuhaow@mit.edu Liam Solus Department of Mathematics KTH Royal Institute of Technology Stockholm, Sweden solus@kth.se Karren Dai Yang Institute for Data, Systems and Society and Broad Institute of MIT and Harvard Massachusetts Institute of Technology Cambridge, MA 02139 karren@mit.edu Caroline Uhler Laboratory for Information and Decision Systems and Institute for Data, Systems and Society Massachusetts Institute of Technology Cambridge, MA 02139 cuhler@mit.edu
Pseudocode	Yes	Algorithm 1: Input: Observations ˆX, an initial permutation π0, a threshold δn > PK k=1 λnk, and a set of interventional targets I = {I1, . . . , IK}. Output: A permutation π and its minimal I-MAP Gπ. 1 Set Gπ := argmax G consistent with π Score(G); 2 Using a depth-ﬁrst search approach with root π, search for a permutation πs with Score(Gπs) > Score(Gπ) that is connected to π through a sequence of permutations π0 = π, π1, , πs 1, πs, where each permutation πk is produced from πk 1 by a transposition that corresponds to a covered edge in Gπk 1 such that Score(Gπk) > Score(Gπk 1) δn. If no such Gπs exists, return π and Gπ; else set π := πs and repeat.
Open Source Code	Yes	The code utilized for the following experiments can be found at https://github.com/yuhaow/sp-intervention.
Open Datasets	Yes	The ﬁrst dataset is the protein signaling dataset of Sachs et al. [21], and the second is the single-cell gene expression data generated using perturb-seq in [4].
Dataset Splits	No	The paper mentions total sample sizes for observational and interventional data (e.g., "1755 observational measurements and 4091 interventional measurements" for Sachs data, and "992 observational samples and 13,435 interventional samples" for perturb-seq data) and describes a held-out evaluation strategy ("each one trained with one of the interventional datasets being left out") but does not provide specific train/validation/test dataset splits with percentages or sample counts for a typical training/validation/test workflow.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU/CPU models, memory specifications).
Software Dependencies	No	The paper mentions the "R-package pcalg" for GIES but does not provide specific version numbers for software dependencies or other key software components used in their own implementation.
Experiment Setup	Yes	For p = 10, the size of each intervention set was 5 for K = 1 and 4 for K = 2. For p = 20, the size was increased up to 10 and 8 to keep the proportion of intervened nodes constant. In each study, we compared GIES with Algorithm 2 for n samples for each intervention with n = 10^3, 10^4, 10^5. Our tuning parameter is the cut-off value for the CI tests, just as in the simulated data studies in Section 5.1. Figure 4 reports our results for thirteen different cut-off values in [10^-4, 0.7].