Linear Causal Disentanglement via Interventions
Authors: Chandler Squires, Anna Seigal, Salil S Bhate, Caroline Uhler
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 4, we apply the method to synthetic and semi-synthetic data and show that it recovers the generative model, and we compute a linear causal disentanglement on a single-cell RNA sequencing dataset. |
| Researcher Affiliation | Academia | 1Broad Institute of MIT and Harvard 2Laboratory for Information and Decision Systems, MIT 3School of Engineering and Applied Sciences, Harvard University. |
| Pseudocode | Yes | Algorithm 1 ID-ANCESTORS |
| Open Source Code | Yes | All code for data generation and for our adapted versions of Algorithms 1, 2, and 3 (that is, Algorithms 6, 5 and 7) can be found at the link in Appendix M. Our code can be found at https://github.com/csquires/linear-causal-disentanglement-via-interventions. |
| Open Datasets | Yes | We evaluate our method on a dataset from Ursu et al. (2022). This single-cell RNA sequencing (sc RNA-seq) dataset consists of 90,000 cells from a lung cancer cell line, with 83 different nonsynonymous mutations of the KRAS oncogene overexpressed. The sc RNA-seq dataset of Ursu et al. (2022) is available at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE161824. The TCGA dataset of Liu et al. (2018) is available at https://gdc-hub.s3.us-east-1.amazonaws.com/download/TCGA-LUAD.survival.tsv and https://gdc-hub.s3.us-east-1.amazonaws.com/download/TCGA-LUAD.htseq_fpkm.tsv.gz. |
| Dataset Splits | No | The paper describes generating synthetic data and using semi-synthetic/biological data but does not explicitly specify training, validation, or test dataset splits (e.g., percentages or counts) for reproducibility. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory, or cloud instance types) used to run the experiments. |
| Software Dependencies | No | The paper mentions software packages like 'gurobipy', 'lifelines', and 'statsmodels', but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | There is a single hyperparameter γ [0, 1], the percentage of spectral energy associated to the largest singular value, above which we consider a matrix to have rank one. We use γ = 0.99. We generate 500 random models following Assumption 1 for d = 5 latent and p = 10 observed variables, as follows. We sample the graph G from an Erd os-R enyi random graph model with density 0.75. We sample the nonzero entries of A0 independently from Unif( [0.25, 1]), and the nonzero entries of Ω0 independently from Unif([2, 4]). We sample uniformly among permutations to generate the intervention targets ik. In context k, we have Ak = A0 eik A 0 eik; i.e., all entries in row ik are 0. We change (Ω0)ik,ik into a new value (Ωk)ik,ik, sampled from Unif([6, 8]) to ensure a non-negligible change. Finally, the entries of H are sampled independently from Unif([ 2, 2]). |