Causal normalizing flows: from theory to practice

Authors: Adrián Javaloy, Pablo Sanchez-Martin, Isabel Valera

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, in our experiments, we validate our design and training choices through a comprehensive ablation study; compare causal NFs to other approaches for approximating causal models; and empirically demonstrate that causal NFs can be used to address real-world problems where mixed discretecontinuous data and partial knowledge on the causal graph is the norm.
Researcher Affiliation Academia 1Department of Computer Science of Saarland University, Saarbr ucken, Germany 2Max Planck Institute for Intelligent Systems, T ubingen, Germany 3Max Planck Institute for Software Systems, Saarbr ucken, Germany
Pseudocode Yes Algorithm 1 Algorithm to sample from the interventional distribution, P(x | do(xi = α)).
Open Source Code Yes The code for this work can be found at https://github.com/psanch21/causal-flows.
Open Datasets Yes We generated with using synthetic SCM a dataset with 20 000 training samples, 2500 validation samples, and 2500 test samples. We ran every model for 1000 epochs, and the results shown in the manuscript correspond to the test set evaluation at the last epoch. For the optimization, we used Adam [17] with an initial learning rate of 0.001, and reduce the learning rate with a decay factor of 0.95 when it reaches a plateau longer than 60 epochs. For hyperparameter tuning, we always perform a grid search with similar budget, and select the best hyperparameter combination according to validation loss, reporting always results from the test dataset in the manuscript. Every experiment is repeated 5 times, and we show averages and standard deviations. The exogenous variables always follow a standard normal distribution N(0, 1), except for LARGEBD, where a uniform distribution U(0, 1) is used instead. Subsequently, we define the 12 SCMs employed encompassing both linear and non-linear equations and we additionally provide their causal graph in Fig. 9. The German Credit dataset [8] a dataset from the UCI repository
Dataset Splits Yes For every experiment, we generated with using synthetic SCM a dataset with 20 000 training samples, 2500 validation samples, and 2500 test samples.
Hardware Specification Yes Every individual experiment shown in this paper ran on a single CPU with 8 GB of RAM. To run all experiments, we used a local computing cluster with an automatic job assignment system, so we cannot ensure the specific CPU used for each particular experiment. However, we know that every experiment used one of the following CPUs picked randomly given the demand when scheduled: AMD EPYC 7702 64-Core Processor, AMD EPYC 7662 64-Core Processor, Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz, or Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz.
Software Dependencies No The paper mentions software components like "Mask Autoregressive Flows (MAFs) [24]", "ELU [4] activation functions", and "Neural Spline Flow (NSF) [9]" but does not specify their version numbers.
Experiment Setup Yes We used Adam [17] with an initial learning rate of 0.001, and reduce the learning rate with a decay factor of 0.95 when it reaches a plateau longer than 60 epochs. For hyperparameter tuning, we always perform a grid search with similar budget, and select the best hyperparameter combination according to validation loss, reporting always results from the test dataset in the manuscript. Every experiment is repeated 5 times, and we show averages and standard deviations. Specifically, we considered the following combinations ([a, b] represents two layers with a and b hidden units): [16, 16, 16, 16], [32, 32, 32], [16, 16, 16], [32, 32], [32], [64]. While we fixed the flow to have a single MAF [24] layer with ELU [4] activation functions, we determined through cross-validation the optimal number of layers and hidden units of the MLP network within MAF.