Linear Causal Disentanglement via Interventions

Authors: Chandler Squires, Anna Seigal, Salil S Bhate, Caroline Uhler

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Section 4, we apply the method to synthetic and semi-synthetic data and show that it recovers the generative model, and we compute a linear causal disentanglement on a single-cell RNA sequencing dataset.
Researcher Affiliation Academia 1Broad Institute of MIT and Harvard 2Laboratory for Information and Decision Systems, MIT 3School of Engineering and Applied Sciences, Harvard University.
Pseudocode Yes Algorithm 1 ID-ANCESTORS
Open Source Code Yes All code for data generation and for our adapted versions of Algorithms 1, 2, and 3 (that is, Algorithms 6, 5 and 7) can be found at the link in Appendix M. Our code can be found at https://github.com/csquires/linear-causal-disentanglement-via-interventions.
Open Datasets Yes We evaluate our method on a dataset from Ursu et al. (2022). This single-cell RNA sequencing (sc RNA-seq) dataset consists of 90,000 cells from a lung cancer cell line, with 83 different nonsynonymous mutations of the KRAS oncogene overexpressed. The sc RNA-seq dataset of Ursu et al. (2022) is available at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE161824. The TCGA dataset of Liu et al. (2018) is available at https://gdc-hub.s3.us-east-1.amazonaws.com/download/TCGA-LUAD.survival.tsv and https://gdc-hub.s3.us-east-1.amazonaws.com/download/TCGA-LUAD.htseq_fpkm.tsv.gz.
Dataset Splits No The paper describes generating synthetic data and using semi-synthetic/biological data but does not explicitly specify training, validation, or test dataset splits (e.g., percentages or counts) for reproducibility.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory, or cloud instance types) used to run the experiments.
Software Dependencies No The paper mentions software packages like 'gurobipy', 'lifelines', and 'statsmodels', but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes There is a single hyperparameter γ [0, 1], the percentage of spectral energy associated to the largest singular value, above which we consider a matrix to have rank one. We use γ = 0.99. We generate 500 random models following Assumption 1 for d = 5 latent and p = 10 observed variables, as follows. We sample the graph G from an Erd os-R enyi random graph model with density 0.75. We sample the nonzero entries of A0 independently from Unif( [0.25, 1]), and the nonzero entries of Ω0 independently from Unif([2, 4]). We sample uniformly among permutations to generate the intervention targets ik. In context k, we have Ak = A0 eik A 0 eik; i.e., all entries in row ik are 0. We change (Ω0)ik,ik into a new value (Ωk)ik,ik, sampled from Unif([6, 8]) to ensure a non-negligible change. Finally, the entries of H are sampled independently from Unif([ 2, 2]).