Learning Large DAGs by Combining Continuous Optimization and Feedback Arc Set Heuristics

Authors: Pierre Gillot, Pekka Parviainen6713-6720

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that our methods can quickly find reasonable solutions on datasets with thousands of variables and beyond, even with modest running time. We now present our experimental pipeline. We choose to compare the proposed algorithms (Proxi MAS and Opti MAS) against an iterative method (GD (Park and Klabjan 2017)) and the current state-of-the-art methods for sparse linear SEM structure recovery (NOTEARS (Zheng et al. 2018) and GOLEM (Ng, Ghassami, and Zhang 2020)). In order to evaluate the performance of the different methods, we compute the false negative rate (FNR), false positive rate (FPR) and the normalized structural Hamming distance (SHD) between predicted and groundtruth adjacency matrices.
Researcher Affiliation Academia Pierre Gillot, Pekka Parviainen University of Bergen HIB Thormøhlens gate 55 Postboks 7803 5020 Bergen Pierre.Gillot@uib.no, Pekka.Parviainen@uib.no
Pseudocode Yes Algorithm 1: Proposed method; Algorithm 2: Vectorized greedy MAS. The outline of the proposed method is shown in Algorithm 1. We use a vectorized version of the approximation algorithm by Eades (Eades, Lin, and Smyth 1993) to find the acyclic weighted adjacency matrix Wk (Algorithm 2).
Open Source Code No The paper states that the methods are implemented in PyTorch and discusses re-implementations of other methods, but it does not provide a concrete link to a public code repository or an explicit statement about releasing the code for the described methodology.
Open Datasets No The paper describes a data generation process rather than using a publicly available dataset. It states: 'We adopt a similar setup as in (Zheng et al. 2018; Ng, Ghassami, and Zhang 2020): we first generate random DAGs based on Erd os-R enyi ( ER ) and scale-free ( SF ) models.' and 'Unless stated otherwise, n samples are generated both for the training data and for the validation data, with n {1000, 10000}.' No access information to a pre-existing public dataset is provided.
Dataset Splits Yes Unless stated otherwise, n samples are generated both for the training data and for the validation data, with n {1000, 10000}. The Gaussian negative log-likelihood is also computed on the validation data (unseen during training).
Hardware Specification Yes The experiments were run on a cluster with Intel Xeon-Gold 6138 2.0 GHz / 6230 2.1 GHz CPUs. The number of cores and amount of memory used in each experiment are shown in Tables 2, 3, 4. For instance, Table 2 indicates
Software Dependencies Yes The two proposed methods (Proxi MAS and Opti MAS) are implemented in pytorch 1.8. The original implementation of NOTEARS relies on a L-BFGS-B solver implemented in scipy and as mentioned in (Ng, Ghassami, and Zhang 2020) it does not scale to large instances with thousands of variables, thus for fairness we re-implemented it in pytorch as well. The existing implementation of the GD algorithm is written in R and uses the highly optimized package glmnet, thus we did not alter the implementation. The methods that rely on automatic differentiation (NOTEARS, GOLEM and Opti MAS) all use the Adam optimizer (Kingma and Ba 2015) with default learning rate 0.001.
Experiment Setup Yes All the tested methods have a hyperparameter λ1 to promote sparsity; an additional hyperparameter λ2 exists for all tested methods except GD to enforce dagness (see Table 1). Additionally, in all experiments Proxi MAS and Opti MAS are configured to start enforcing acyclicity after 50 minutes of solving time, whereas NOTEARS and GOLEM enforce acyclicity the entire time as in their original papers. As suggested in (Ng, Ghassami, and Zhang 2020), we warm-start GOLEM-NV with a solution returned by GOLEM-EV when working on NV-generated data: the first half of the allowed running time runs GOLEM-EV whereas the second half runs GOLEM-NV. Finally, the methods that rely on automatic differentiation (NOTEARS, GOLEM and Opti MAS) all use the Adam optimizer (Kingma and Ba 2015) with default learning rate 0.001 as in (Ng, Ghassami, and Zhang 2020).