reproducibilityindex.ai

GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation

Authors: Chence Shi*, Minkai Xu*, Zhaocheng Zhu, Weinan Zhang, Ming Zhang, Jian Tang

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that Graph AF is able to generate 68% chemically valid molecules even without chemical knowledge rules and 100% valid molecules with chemical rules. The training process of Graph AF is two times faster than the existing state-of-the-art approach GCPN. After ﬁne-tuning the model for goal-directed property optimization with reinforcement learning, Graph AF achieves state-of-the-art performance on both chemical property optimization and constrained property optimization.
Researcher Affiliation	Academia	1Department of Computer Science, Peking University, China 2Shanghai Jiao Tong University, China 3Mila Qu ebec AI Institute, Canada 4Universit e de Montr eal, Canada 5HEC Montr eal, Canada 6CIFAR AI Research Chair
Pseudocode	Yes	We summarize the detailed training algorithm into Appendix B. Algorithm 1 Parallel Training Algorithm of Graph AF
Open Source Code	Yes	Code is available at https://github.com/Deep Graph Learning/Graph AF
Open Datasets	Yes	We use the ZINC250k molecular dataset (Irwin et al., 2012) for training.
Dataset Splits	No	The paper mentions using the ZINC250k dataset for training and evaluates metrics on generated molecules, but it does not specify explicit train/validation/test dataset splits (e.g., percentages or counts for each split) or refer to standard predefined splits for reproducibility.
Hardware Specification	Yes	To achieve the results in Table 2, JT-VAE and GCPN take around 24 and 8 hours, respectively, while Graph AF only takes 4 hours. a machine with 1 Tesla V100 GPU and 32 CPU cores.
Software Dependencies	No	Graph AF is implemented in Py Torch (Paszke et al., 2017). We use the open-source chemical software RDkit (Landrum, 2016) to preprocess molecules. We use Adam (Kingma & Ba, 2014) to optimize our model. The paper mentions software and frameworks but does not provide specific version numbers for PyTorch or RDkit.
Experiment Setup	Yes	The R-GCN is implemented with 3 layers, and the embedding dimension is set as 128. The max graph size is set as 48 empirically. For density modeling, we train our model for 10 epochs with a batch size of 32 and a learning rate of 0.001. We use Adam (Kingma & Ba, 2014) to optimize our model. gamma is set to 0.97 for QED optimization and 0.9 for penalized log P optimization respectively. We ﬁne-tune the pretrained model for 200 iterations with a ﬁxed batch size of 64 using Adam optimizer. We also adopt a linear learning rate warm-up to stabilize the training. We use Adam with a learning rate of 0.0001 to optimize the model.