reproducibilityindex.ai

Identifiability Guarantees for Causal Disentanglement from Soft Interventions

Authors: Jiaqi Zhang, Kristjan Greenewald, Chandler Squires, Akash Srivastava, Karthikeyan Shanmugam, Caroline Uhler

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We now demonstrate our method on a biological dataset. We use the large-scale Perturb-seq study from [44]. After pre-processing, the data contains 8,907 unperturbed cells (observational dataset D) and 99,590 perturbed cells. The perturbed cells underwent CRISPR activation [16] targeting one or two out of 105 genes (interventional datasets D1,...,DK, K = 217). CRISPR activation experiments modulate the expression of their target genes, which we model as a shift intervention. Each interventional dataset comprises 50 to 2,000 cells. Each cell is represented as a 5,000-dimensional vector (observed variable X) measuring the expressions of 5,000 highly variable genes. To test our model, we set the latent dimension p = 105, corresponding to the total number of targeted genes.
Researcher Affiliation	Collaboration	Jiaqi Zhang LIDS, MIT Broad Institute of MIT and Harvard Kristjan Greenewald MIT-IBM Watson AI Lab IBM Research Chandler Squires LIDS, MIT Broad Institute of MIT and Harvard Akash Srivastava MIT-IBM Watson AI Lab IBM Research Karthikeyan Shanmugam IBM Research Caroline Uhler LIDS, MIT Broad Institute of MIT and Harvard
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code for our method is at https://github.com/uhlerlab/discrepancy_vae.
Open Datasets	Yes	We use the large-scale Perturb-seq study from [44].
Dataset Splits	No	The paper mentions reserving samples for testing, but does not explicitly describe a validation dataset split for hyperparameter tuning or early stopping.
Hardware Specification	No	The paper mentions training on a “single GPU” but does not specify the model or any other hardware details.
Software Dependencies	No	The paper mentions “Py Torch” but does not provide specific version numbers for software dependencies.
Experiment Setup	Yes	We summarize our hyperparameters in Table 2. ... Loss function Kernel width (MMD) 200 Number of kernels (MMD) 10 λ 0.1 βmax 1 αmax 1 Training tmax 100 Learning rate 0.001 Batch size 32 ... We train for 100 epochs in total, which takes less than 45 minutes on a single GPU.