Causal Discovery with Reinforcement Learning
Authors: Shengyu Zhu, Ignavier Ng, Zhitang Chen
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on both synthetic and real datasets, and show that the proposed approach not only has an improved search ability but also allows a flexible score function under the acyclicity constraint. |
| Researcher Affiliation | Collaboration | Shengyu Zhu Ignavier Ng Zhitang Chen Huawei Noah s Ark Lab University of Toronto {zhushengyu,chenzhitang2}@huawei.com ignavierng@cs.toronto.edu |
| Pseudocode | Yes | Algorithm 1 The proposed RL approach to score-based causal discovery |
| Open Source Code | No | Our implementation is based on an existing Tensorflow implementation of neural combinatorial optimizer that is available at https://github.com/Michel Deudon/ neural-combinatorial-optimization-rl-tensorflow. We add an entropy regularization term, and modify the reward and decoder as described in Sections 4 and 5.1, respectively. |
| Open Datasets | Yes | We consider a real dataset to discover a protein signaling network based on expression levels of proteins and phospholipids (Sachs et al., 2005). |
| Dataset Splits | No | The paper describes the number of samples generated or used from a real dataset (m = 5,000, m = 853) but does not specify explicit train/validation/test splits with percentages, counts, or references to predefined splits. |
| Hardware Specification | No | The paper does not specify any hardware used for running the experiments (e.g., specific CPU/GPU models, memory, or cloud instance types). |
| Software Dependencies | No | Our implementation is based on an existing Tensorflow (Abadi et al., 2016) implementation of neural combinatorial optimizer. Default hyper-parameters of these implementations are used unless otherwise stated. |
| Experiment Setup | Yes | We pick B = 64 as batch size at each iteration and dh = 16 as the hidden dimension with the single layer decoder. Our approach is combined with the BIC scores under Gaussianity assumption given in Eqs. (2) and (3), and are denoted as RL-BIC and RL-BIC2, respectively. We use a threshold 0.3, same as NOTEARS and DAG-GNN with this data model, to prune the estimated edges. Other parameter choices in this work are S0 = 5, t0 = 1, 000, λ1 = 0, α1 = 1, λ2 = 10 d/3 , α2 = 10 and Λ2 = 0.01. |