Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Identifying biological perturbation targets through causal differential networks
Authors: Menghua Wu, Umesh Padia, Sean H. Murphy, Regina Barzilay, Tommi Jaakkola
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate CDN on real transcriptomic data and synthetic settings. CDN outperforms the state-of-the-art in perturbation modeling (deep learning and statistical approaches), evaluated on the five largest Perturb-seq datasets at the time of publication (Replogle et al., 2022; Nadig et al., 2024) without using any external knowledge. Furthermore, CDN generalizes with minimal performance drop to unseen cell lines, which have different supports (genes), causal mechanisms (gene regulatory networks), and data distributions. On synthetic settings, CDN outperforms causal discovery approaches for estimating unknown intervention targets. |
| Researcher Affiliation | Academia | 1Department of Computer Science, Massachusetts Institute of Technology, Cambridge, MA, USA. Correspondence to: Menghua Wu <EMAIL>. |
| Pseudocode | No | The paper describes the model architecture in Section 3.1 and Appendix A, including equations and descriptions of layers. However, it does not present any structured pseudocode or algorithm blocks with numbered steps. |
| Open Source Code | Yes | 1Code is available at https://github.com/rmwu/cdn |
| Open Datasets | Yes | We validate CDN on five Perturb-seq (Dixit et al., 2016) datasets (genetic perturbations) from Replogle et al. (2022) and Nadig et al. (2024); as well as two Sci-Plex (Srivatsan et al., 2020) datasets (chemical perturbations) from Mc Faline-Figueroa et al. (2024). Each dataset is a real-valued matrix of gene expression levels: the number of examples M is the number of cells, the number of variables N is the number of genes, and each entry is a log-normalized count of how many copies of gene j was measured from cell i. Table 4: Extended biological dataset statistics (raw). Type Source Accession Cell line # Perts # Genes # NTCs # Cells Replogle et al. (2022) Figshare 20029387 K562 gw 9,866 8,248 75,328 1,989,578 Nadig et al. (2024) GSE220095 Hep G2 2,393 9,624 4,976 145,473 Chemical Mc Faline-Figueroa et al. (2024) GSM7056151 A172 23 8,393 8,660 58,347 |
| Dataset Splits | Yes | We consider two splits: seen and unseen cell lines. In the former, models may be trained on approximately half of the perturbations from each cell line, and are evaluated on the unseen perturbations. In the latter, we hold out one cell line at a time, and models may be trained on data from the remaining cell lines. To ensure that our train and test splits are sufficiently distinct, we cluster perturbations based on their log-fold change and assign each cluster to the same split (Figure 5). Table 5: Extended biological dataset statistics (processed). Perturbations Genes Cells Dataset Train Test Trivial Non-trivial Unique # DE Median # DE K562 gw 1089 678 587 91 7,378 81 492,096 |
| Hardware Specification | Yes | During training, we used 15 CPU workers (primarily for local graph estimates) and 1 A6000 GPU. All models run on a single A6000 GPU, no constraint on memory (up to 500G). |
| Software Dependencies | No | The paper mentions using specific algorithms like 'Adam W optimizer (Loshchilov & Hutter, 2019)' and 'FCI algorithm (Spirtes et al., 1995)', and libraries like 'scikit-learn (Pedregosa et al., 2011)' and 'scanpy package (Wolf et al., 2018)'. While these tools are identified, no specific version numbers (e.g., PyTorch 1.9, Python 3.8) for the authors' core implementation are provided. The statement 'We used the latest releases of all baselines.' refers to third-party tools, not the authors' own software dependencies with specific version numbers. |
| Experiment Setup | Yes | We swept over the number of differential network layers (Figure 4) on synthetic data, and we used 3 layers for hcat and 2 layers for hdiff. Following SEA, we adopted hidden dimension d = 64, the Adam W optimizer (Loshchilov & Hutter, 2019), learning rate 1e-4, batch size 16, and weight decay 1e-5. On the real data, where N = 1000, we changed to a batch size of 1, decreased the learning rate to 5e-6, and finetuned the models with half precision (FP16). |