reproducibilityindex.ai

Linear Causal Disentanglement via Interventions

Authors: Chandler Squires, Anna Seigal, Salil S Bhate, Caroline Uhler

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Section 4, we apply the method to synthetic and semi-synthetic data and show that it recovers the generative model, and we compute a linear causal disentanglement on a single-cell RNA sequencing dataset.
Researcher Affiliation	Academia	1Broad Institute of MIT and Harvard 2Laboratory for Information and Decision Systems, MIT 3School of Engineering and Applied Sciences, Harvard University.
Pseudocode	Yes	Algorithm 1 ID-ANCESTORS
Open Source Code	Yes	All code for data generation and for our adapted versions of Algorithms 1, 2, and 3 (that is, Algorithms 6, 5 and 7) can be found at the link in Appendix M. Our code can be found at https://github.com/csquires/linear-causal-disentanglement-via-interventions.
Open Datasets	Yes	We evaluate our method on a dataset from Ursu et al. (2022). This single-cell RNA sequencing (sc RNA-seq) dataset consists of 90,000 cells from a lung cancer cell line, with 83 different nonsynonymous mutations of the KRAS oncogene overexpressed. The sc RNA-seq dataset of Ursu et al. (2022) is available at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE161824. The TCGA dataset of Liu et al. (2018) is available at https://gdc-hub.s3.us-east-1.amazonaws.com/download/TCGA-LUAD.survival.tsv and https://gdc-hub.s3.us-east-1.amazonaws.com/download/TCGA-LUAD.htseq_fpkm.tsv.gz.
Dataset Splits	No	The paper describes generating synthetic data and using semi-synthetic/biological data but does not explicitly specify training, validation, or test dataset splits (e.g., percentages or counts) for reproducibility.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory, or cloud instance types) used to run the experiments.
Software Dependencies	No	The paper mentions software packages like 'gurobipy', 'lifelines', and 'statsmodels', but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	There is a single hyperparameter γ [0, 1], the percentage of spectral energy associated to the largest singular value, above which we consider a matrix to have rank one. We use γ = 0.99. We generate 500 random models following Assumption 1 for d = 5 latent and p = 10 observed variables, as follows. We sample the graph G from an Erd os-R enyi random graph model with density 0.75. We sample the nonzero entries of A0 independently from Unif( [0.25, 1]), and the nonzero entries of Ω0 independently from Unif([2, 4]). We sample uniformly among permutations to generate the intervention targets ik. In context k, we have Ak = A0 eik A 0 eik; i.e., all entries in row ik are 0. We change (Ω0)ik,ik into a new value (Ωk)ik,ik, sampled from Unif([6, 8]) to ensure a non-negligible change. Finally, the entries of H are sampled independently from Unif([ 2, 2]).