Rhino: Deep Causal Temporal Relationship Learning with History-dependent Noise

Authors: Wenbo Gong, Joel Jennings, Cheng Zhang, Nick Pawlowski

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical results from extensive synthetic experiments and two real-world benchmarks demonstrate better discovery performance compared to relevant baselines, with ablation studies revealing its robustness under model misspecification.
Researcher Affiliation Industry Wenbo Gong, Joel Jennings, Cheng Zhang & Nick Pawlowski Microsoft Research Cambridge, UK {wenbogong, joeljennings, cheng.zhang, nick.pawlowski} @microsoft.com
Pseudocode No No pseudocode or algorithm blocks are explicitly presented in the paper.
Open Source Code Yes We release the code of Rhino for reproducing the following experiments.2. https://github.com/microsoft/causica/tree/v0.0.0
Open Datasets Yes For DREAM3 and Netsim, the dataset can be found in the public github repo https://github.com/sakhanna/SRU_for_GCI/tree/master/data.
Dataset Splits Yes For tuning the hyper-parameters of Rhino, its variants and DYNOTEARS, we split each of the 5 datasets into 80%/20% training/validation. For the Netsim experiment, we extract subject 2-6 in Sim-3.mat to form the training data and use subject 7-8 as validation dataset.
Hardware Specification No The paper does not specify the exact hardware (e.g., GPU/CPU models, memory details) used for running the experiments.
Software Dependencies No The paper mentions software packages like lingam, Tigramite, and causalnex but does not provide specific version numbers for them.
Experiment Setup Yes By default, we allow Rhino and its variants to model instantaneous effect; set the model lag to be the ground truth 2 except for ablation study; the qϕ(G) is initialized to favour sparse graphs (edge probability< 0.5); quadratic spline flow is used to for history-dependent noise. For the model formulation, we use 2 layer fully connected MLPs with 64 (5 and 10 nodes), 80 (10 nodes) and 160 (40 nodes) for all neural networks in Rhino-based methods. We also apply layer normalization and residual connections to each layer of the MLPs. For the gradient estimator, we use the Gumbel softmax method with a hard forward pass and a soft backward pass with temperature of 0.25. All spline flows uses 8 bins. The embedding sizes for transformation (i.e. Eq. (7) and conditional spline flow) is equal to the node number. For the sparseness penalty λs in Eq. (10), we use 9 for Rhino and Rhino+s, and 5 for Rhino+g. We set ρ = 1 and α = 0 for all Rhino-based methods. For optimization, we use Adam (Kingma & Ba, 2014) with learning rate 0.01.