DAGMA: Learning DAGs via M-matrices and a Log-Determinant Acyclicity Characterization

Authors: Kevin Bello, Bryon Aragam, Pradeep Ravikumar

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we provide extensive experiments for linear and nonlinear SEMs and show that our approach can reach large speedups and smaller structural Hamming distances against state-of-the-art methods.
Researcher Affiliation Academia Booth School of Business, University of Chicago, Chicago, IL 60637 Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA 15213
Pseudocode Yes Algorithm 1 DAGMA
Open Source Code Yes Code implementing the proposed method is open-source and publicly available at https://github.com/kevinsbello/dagma.
Open Datasets No For each d, 30 matrices were randomly sampled from a standard Gaussian distribution. Given a data matrix X = [x1, . . . , xd] Rn d, we define a score function Q(f; X) to measure the quality of a candidate SEM as follows: Q(f; X) = Pd j=1 loss(xj, fj(X)) For linear models. In Appendix C.1, we report results for linear SEMs with Gaussian, Gumbel, and exponential noises, and use the least squares loss. This implies data is simulated/generated, not a fixed publicly available dataset they are using. No access information is provided for generated data.
Dataset Splits No No explicit training, validation, or test dataset splits (e.g., percentages, sample counts, or specific split methodologies) are mentioned in the paper. The paper implies data is generated and used for optimization.
Hardware Specification Yes All experiments were performed on a cluster running Ubuntu 18.04.5 LTS with Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, and NVIDIA Tesla V100 GPU.
Software Dependencies Yes For our proposed method DAGMA, we implemented it in Python 3.8 and PyTorch 1.10.0. We use Adam [24] for optimization.
Experiment Setup Yes For all linear and nonlinear SEM experiments, we set the number of iterations T = 10000, initial central path coefficient µ(0) = 1, decay factor α = 0.5, ℓ1 parameter β1 = 0.01, log-det parameter s = 1.0. We use Adam optimizer with learning rate 0.001.