Amortized Inference for Causal Structure Learning
Authors: Lars Lorch, Scott Sussex, Jonas Rothfuss, Andreas Krause, Bernhard Schölkopf
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On synthetic data and semisynthetic gene expression data, our models exhibit robust generalization capabilities when subject to substantial distribution shifts and significantly outperform existing algorithms, especially in the challenging genomics domain. Our code and models are publicly available at: https://github.com/larslorch/avici. |
| Researcher Affiliation | Academia | Lars Lorch ETH Zurich Zurich, Switzerland llorch@ethz.ch Scott Sussex ETH Zurich Zurich, Switzerland ssussex@ethz.ch Jonas Rothfuss ETH Zurich Zurich, Switzerland rojonas@ethz.ch Andreas Krause ETH Zurich Zurich, Switzerland krausea@ethz.ch Bernhard Schölkopf MPI for Intelligent Systems Tübingen, Germany bs@tuebingen.mpg.de |
| Pseudocode | Yes | Algorithm 1 Training the inference model fϕ |
| Open Source Code | Yes | Our code and models are publicly available at: https://github.com/larslorch/avici. |
| Open Datasets | Yes | In Appendix E, we additionally report results on a real-world proteomics dataset (Sachs et al., 2005). In addition to SCMs, we consider the challenging domain of GRNs (GRN) using the simulator of Dibaeinia and Sinha (2020). In the GRN domain, we use subgraphs of the known S. cerevisiae and E. coli GRNs and their effect signs whenever known. To extract these subgraphs, we use the procedure by Marbach et al. (2009). |
| Dataset Splits | No | The paper mentions training data and unseen test data, but does not specify a separate validation split with percentages or counts. |
| Hardware Specification | Yes | All experiments ran on a private cluster with Intel Xeon E5-2630 v4 CPUs and NVIDIA Tesla V100 GPUs. |
| Software Dependencies | No | The paper mentions software like JAX and Haiku, but does not provide specific version numbers for these or other libraries used. |
| Experiment Setup | Yes | We train all models for 300k steps with a batch size of 20. We use the Adam optimizer (Kingma and Ba, 2015) with an initial learning rate of 10−4 that is linearly warmed up for 10k steps and then decayed proportionally to the inverse square root of the step count. We set τ to 1.0 during training and decrease it to 0.1 during evaluation and when plotting calibration curves. The dual variable λ is initialized to 1.0 and updated every 10 steps with a step size η = 0.01. We use a hidden dimension of k = 128 for the embeddings and the feed-forward networks, and 8 attention heads in each layer. Dropout (Srivastava et al., 2014) is applied before each residual connection with a rate of 0.1 for LINEAR and RFF and 0.2 for GRN. |