Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Linear Causal Disentanglement via Interventions
Authors: Chandler Squires, Anna Seigal, Salil S Bhate, Caroline Uhler
ICML 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 4, we apply the method to synthetic and semi-synthetic data and show that it recovers the generative model, and we compute a linear causal disentanglement on a single-cell RNA sequencing dataset. |
| Researcher Affiliation | Academia | 1Broad Institute of MIT and Harvard 2Laboratory for Information and Decision Systems, MIT 3School of Engineering and Applied Sciences, Harvard University. |
| Pseudocode | Yes | Algorithm 1 ID-ANCESTORS |
| Open Source Code | Yes | All code for data generation and for our adapted versions of Algorithms 1, 2, and 3 (that is, Algorithms 6, 5 and 7) can be found at the link in Appendix M. Our code can be found at https://github.com/csquires/linear-causal-disentanglement-via-interventions. |
| Open Datasets | Yes | We evaluate our method on a dataset from Ursu et al. (2022). This single-cell RNA sequencing (sc RNA-seq) dataset consists of 90,000 cells from a lung cancer cell line, with 83 different nonsynonymous mutations of the KRAS oncogene overexpressed. The sc RNA-seq dataset of Ursu et al. (2022) is available at https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE161824. The TCGA dataset of Liu et al. (2018) is available at https://gdc-hub.s3.us-east-1.amazonaws.com/download/TCGA-LUAD.survival.tsv and https://gdc-hub.s3.us-east-1.amazonaws.com/download/TCGA-LUAD.htseq_fpkm.tsv.gz. |
| Dataset Splits | No | The paper describes generating synthetic data and using semi-synthetic/biological data but does not explicitly specify training, validation, or test dataset splits (e.g., percentages or counts) for reproducibility. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory, or cloud instance types) used to run the experiments. |
| Software Dependencies | No | The paper mentions software packages like 'gurobipy', 'lifelines', and 'statsmodels', but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | There is a single hyperparameter γ [0, 1], the percentage of spectral energy associated to the largest singular value, above which we consider a matrix to have rank one. We use γ = 0.99. We generate 500 random models following Assumption 1 for d = 5 latent and p = 10 observed variables, as follows. We sample the graph G from an Erd os-R enyi random graph model with density 0.75. We sample the nonzero entries of A0 independently from Unif( [0.25, 1]), and the nonzero entries of Ω0 independently from Unif([2, 4]). We sample uniformly among permutations to generate the intervention targets ik. In context k, we have Ak = A0 eik A 0 eik; i.e., all entries in row ik are 0. We change (Ω0)ik,ik into a new value (Ωk)ik,ik, sampled from Unif([6, 8]) to ensure a non-negligible change. Finally, the entries of H are sampled independently from Unif([ 2, 2]). |