Scalable and Flexible Causal Discovery with an Efficient Test for Adjacency
Authors: Alan Nawzad Amin, Andrew Gordon Wilson
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we show that DAT-Graph is able to easily scale to learn on real and synthetic data with 103 variables and 104 observations. We show DAT-Graph learns large sparse causal graphs more accurately than state of the art gradient-based model selection procedures with and without interventions. We also use DAT-Graph to reduce the search space of model selection procedures to build even more accurate hybrid models. We show that these hybrid models accurately predict the effects of interventions on large scale RNA sequencing data. |
| Researcher Affiliation | Academia | 1New York University, New York, USA. |
| Pseudocode | No | The paper describes the steps of the Differentiable Adjacency Test (DAT) and DAT-Graph but does not include any formal 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/ Alan Nawzad Amin/DAT-graph/. |
| Open Datasets | Yes | Here we learn from a single-cell RNA sequencing experiment of cancer that is resistant to immunotherapy (Frangieh et al., 2021). |
| Dataset Splits | Yes | We split each dataset into a training set and a test set containing interventions that are not in the training set. |
| Hardware Specification | Yes | We perform all experiments on a single CPU and a single RTX 8000 GPU. |
| Software Dependencies | No | The paper mentions using specific algorithms like the Adam optimizer and referring to external codebases, but it does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We have four threshold hyperparameters: one for deciding the edges of the moral graph η3, one for deciding edges in the skeleton η1, one for deciding v-structures η2, and one for deciding edges in the moral graph connected to intervention variables η4. We choose η3 = 8 10 3, η1 = 10 4, η2 = 0.2, η4 = 10 3. We used batch sizes of size 256 in all cases. To train the neural networks to predict the moral graph, we used the Adam optimizer with parameters β1, β2 = 0.9, 0.999 and learning rate 10 4 and trained for 30000 minibatches. ... For {θ1, θ2} we used the Adam optimizer with parameters β1, β2 = 0.9, 0.999 and learning rate 3 10 4 while for ψ we used β1, β2 = 0.9, 0.9 and a learning rate of 3 10 4. We train models for all tests in parallel on a GPU. To predict the moral graph, we used 3 layer neural networks with 200 hidden units. We used a Re LU activation and included dropout and batchnorm between layers. We used a dropout probability of 0.1 between the first and second layer and a probability of 0.5 between the second and third. To predict the skeleton, we used 3 layer neural networks with 100 hidden units. |