Diffusion Models for Causal Discovery via Topological Ordering
Authors: Pedro Sanchez, Xiao Liu, Alison Q O'Neil, Sotirios A. Tsaftaris
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show empirically that our method scales exceptionally well to datasets with up to 500 nodes and up to 105 samples while still performing on par over small datasets with state-of-the-art causal discovery methods. |
| Researcher Affiliation | Collaboration | 1The University of Edinburgh 2Canon Medical Research Europe 3The Alan Turing Institute |
| Pseudocode | Yes | Algorithm 1: Topological Ordering with Diff AN |
| Open Source Code | Yes | pedro.sanchez@ed.ac.uk 1Implementation is available at https://github.com/vios-s/Diff AN . |
| Open Datasets | Yes | We consider two real datasets: (i) Sachs: A protein signaling network based on expression levels of proteins and phospholipids (Sachs et al., 2005). We consider only the observational data (n = 853 samples) since our method targets discovery of causal mechanisms when only observational data is available. The ground truth causal graph given by Sachs et al. (2005) has 11 nodes and 17 edges. (ii) Syn TRe N: We also evaluate the models on a pseudo-real dataset sampled from Syn TRe N generator (Van den Bulcke et al., 2006). |
| Dataset Splits | No | The paper mentions training data and a subsample but does not explicitly provide percentages or counts for train/validation/test splits, nor does it refer to a standard split by citation for the synthetic data. For real data, it states 'n = 853 samples' for Sachs but no split information. |
| Hardware Specification | No | The paper mentions "64GB of RAM" in the context of comparing its method's scalability to SCORE, indicating a limitation of other methods on a specific machine. However, it does not specify the hardware (CPU, GPU, specific RAM configuration) used for its own experiments. |
| Software Dependencies | No | The paper describes the neural network architecture (MLP with Linear layers, Leaky ReLU, Layer Norm, Dropout) and mentions using "functorch" for auto-differentiation, but it does not specify version numbers for any software, libraries (like PyTorch or TensorFlow), or programming languages. |
| Experiment Setup | Yes | D.1 HYPERPARAMETERS OF DPM TRAINING: We use number of time steps T = 100, βt is a linearly scheduled between βmin = 0.0001 and βmax = 0.02. The model is trained according to Equation 3 which follows Ho et al. (2020). During sampling, t is sampled from a Uniform distribution. D.2 NEURAL ARCHITECTURE: The neural network follows a simple MLP with 5 Linear layers, Leaky Re LU activation function, Layer Normalization and Dropout in the first layer. The full architecture is detailed in Table 1. |