Rhino: Deep Causal Temporal Relationship Learning with History-dependent Noise
Authors: Wenbo Gong, Joel Jennings, Cheng Zhang, Nick Pawlowski
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical results from extensive synthetic experiments and two real-world benchmarks demonstrate better discovery performance compared to relevant baselines, with ablation studies revealing its robustness under model misspecification. |
| Researcher Affiliation | Industry | Wenbo Gong, Joel Jennings, Cheng Zhang & Nick Pawlowski Microsoft Research Cambridge, UK {wenbogong, joeljennings, cheng.zhang, nick.pawlowski} @microsoft.com |
| Pseudocode | No | No pseudocode or algorithm blocks are explicitly presented in the paper. |
| Open Source Code | Yes | We release the code of Rhino for reproducing the following experiments.2. https://github.com/microsoft/causica/tree/v0.0.0 |
| Open Datasets | Yes | For DREAM3 and Netsim, the dataset can be found in the public github repo https://github.com/sakhanna/SRU_for_GCI/tree/master/data. |
| Dataset Splits | Yes | For tuning the hyper-parameters of Rhino, its variants and DYNOTEARS, we split each of the 5 datasets into 80%/20% training/validation. For the Netsim experiment, we extract subject 2-6 in Sim-3.mat to form the training data and use subject 7-8 as validation dataset. |
| Hardware Specification | No | The paper does not specify the exact hardware (e.g., GPU/CPU models, memory details) used for running the experiments. |
| Software Dependencies | No | The paper mentions software packages like lingam, Tigramite, and causalnex but does not provide specific version numbers for them. |
| Experiment Setup | Yes | By default, we allow Rhino and its variants to model instantaneous effect; set the model lag to be the ground truth 2 except for ablation study; the qϕ(G) is initialized to favour sparse graphs (edge probability< 0.5); quadratic spline flow is used to for history-dependent noise. For the model formulation, we use 2 layer fully connected MLPs with 64 (5 and 10 nodes), 80 (10 nodes) and 160 (40 nodes) for all neural networks in Rhino-based methods. We also apply layer normalization and residual connections to each layer of the MLPs. For the gradient estimator, we use the Gumbel softmax method with a hard forward pass and a soft backward pass with temperature of 0.25. All spline flows uses 8 bins. The embedding sizes for transformation (i.e. Eq. (7) and conditional spline flow) is equal to the node number. For the sparseness penalty λs in Eq. (10), we use 9 for Rhino and Rhino+s, and 5 for Rhino+g. We set ρ = 1 and α = 0 for all Rhino-based methods. For optimization, we use Adam (Kingma & Ba, 2014) with learning rate 0.01. |