Gradient-Based Neural DAG Learning

Authors: Sébastien Lachapelle, Philippe Brouillard, Tristan Deleu, Simon Lacoste-Julien

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On both synthetic and real-world data sets, this new method outperforms current continuous methods on most tasks, while being competitive with existing greedy search methods on important metrics for causal inference. Our second contribution is to provide a missing empirical comparison to existing methods that support nonlinear relationships but tackle the optimization problem in its discrete form using greedy search procedures, namely CAM (B uhlmann et al., 2014) and GSF (Huang et al., 2018). We show that Gra N-DAG is competitive on the wide range of tasks we considered, while using preand post-processing steps similar to CAM.4 EXPERIMENTS In this section, we compare Gra N-DAG to various baselines in the continuous paradigm, namely DAG-GNN (Yu et al., 2019) and NOTEARS (Zheng et al., 2018), and also in the combinatorial paradigm, namely CAM (B uhlmann et al., 2014), GSF (Huang et al., 2018), GES (Chickering, 2003) and PC (Spirtes et al., 2000). These methods are discussed in Section 5.
Researcher Affiliation Academia S ebastien Lachapelle, Philippe Brouillard, Tristan Deleu & Simon Lacoste-Julien Mila & DIRO Universit e de Montr eal Canada CIFAR AI Chair Correspondence to: sebastien.lachapelle@umontreal.ca
Pseudocode No The paper describes its methods using prose and mathematical formulations but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes We provide an implementation of Gra N-DAG here. (The word "here" is a hyperlink to https://github.com/sbprl/grandag)
Open Datasets Yes On both synthetic and real-world data sets... We have tested all methods considered so far on a well known data set which measures the expression level of different proteins and phospholipids in human cells (Sachs et al., 2005). We also consider pseudo-real data sets sampled from the Syn TRe N generator (Van den Bulcke, 2006). Syn TRe N: Ten datasets have been generated using the Syn TRe N generator (http: //bioinformatics.intec.ugent.be/kmarchal/Syn TRe N/index.html) using the software default parameters except for the probability for complex 2-regulator interactions that was set to 1 and the random seeds used were 0 to 9.
Dataset Splits Yes First, as we optimize a subproblem, we evaluate its objective on a held-out data set and declare convergence once it has stopped improving. This approach is known as early stopping (Prechelt, 1997). In Table 5, we present an ablation study which shows the effect of adding PNS and pruning to Gra N-DAG on different performance metrics and on the negative log-likelihood (NLL) of the training and validation set. 80% of the data was used for training and 20% was held out (Gra N-DAG uses the same data for early stopping and hyperparameter selection).
Hardware Specification No The paper mentions that experiments were "enabled by computational resources provided by Calcul Qu ebec, Compute Canada and Element AI" and that runtime can be "roughly halved when executed on GPU," but it does not specify any particular models of GPUs, CPUs, memory, or other detailed hardware components.
Software Dependencies No The paper mentions several software packages like "scikit-learn", "mboost package", "pcalg R package", and "CAM R package". However, it does not provide specific version numbers for any of these dependencies, which would be necessary for reproducibility.
Experiment Setup Yes All Gra N-DAG runs up to this point were performed using the following set of hyperparameters. We used RMSprop as optimizer with learning rate of 10 2 for the first subproblem and 10 4 for all subsequent suproblems. Each NN has two hidden layers with 10 units (except for the real and pseudo-real data experiments of Table 3 which uses only 1 hidden layer). Leaky-Re LU is used as activation functions. The NN are initialized using the initialization scheme proposed in Glorot & Bengio (2010) also known as Xavier initialization. We used minibatches of 64 samples. Table 24 also specifies hyperparameter search spaces for various algorithms, including learning rates, hidden units, and hidden layers.