Directed Acyclic Graph Neural Networks

Authors: Veronika Thost, Jie Chen

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform comprehensive experiments, including ablation studies, on representative DAG datasets (i.e., source code, neural architectures, and probabilistic graphical models) and demonstrate the superiority of DAGNN over simpler DAG architectures as well as general graph architectures.
Researcher Affiliation Collaboration Veronika Thost & Jie Chen MIT-IBM Watson AI Lab, IBM Research Veronika.Thost@ibm.com, chenjie@us.ibm.com
Pseudocode No The paper describes procedures and model equations but does not contain clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Supported code is available at https://github.com/vthost/DAGNN.
Open Datasets Yes The OGBG-CODE dataset (Hu et al., 2020) contains 452,741 Python functions parsed into DAGs. [...] The NA dataset (Zhang et al., 2019) contains 19,020 neural architectures [...] The BN dataset (Zhang et al., 2019) contains 200,000 Bayesian networks...
Dataset Splits Yes We adopt OGB s project split, whose training set consists of Github projects not seen in the validation and test sets. [...] For NA and BN, we adopted the given 90/10 splits. [...] We used 5-fold cross validation due to the size of the dataset and the number of baselines for comparison.
Hardware Specification No Most experiments were conducted on the Satori cluster (satori.mit.edu). This statement provides a general computing environment but lacks specific hardware details such as GPU/CPU models or memory specifications.
Software Dependencies No All models were implemented in Py Torch (Paszke et al., 2019). [...] implemented using Py Torch Geometric (Fey & Lenssen, 2019). [...] generated by using the R package bnlearn (Scutari, 2010). Specific version numbers for these software components are not provided.
Experiment Setup Yes For DAGNN, we used hidden dimension 300. [...] We stopped training when the validation metric did not improve further under a patience of 20 epochs, for all models but D-VAE and DAGNN. For the latter two, we used a patience of 10. Moreover, for these two models we used gradient clipping (at 0.25) due to the recurrent layers and a batch size of 80. [...] For DAGNN, we started the learning rate scheduler at 1e-3 (instead of 1e-4) and stopped at a maximum number of epochs, 100 for NA and 50 for BN (instead of 300 and 100, respectively).