$\mathcal{O}$-GNN: incorporating ring priors into molecular modeling

Authors: Jinhua Zhu, Kehan Wu, Bohan Wang, Yingce Xia, Shufang Xie, Qi Meng, Lijun Wu, Tao Qin, Wengang Zhou, Houqiang Li, Tie-Yan Liu

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through experiments, O-GNN shows good performance on 11 public datasets. In particular, it achieves state-of-the-art validation result on the PCQM4Mv1 benchmark (outperforming the previous KDDCup champion solution) and the drug-drug interaction prediction task on Drug Bank. Furthermore, O-GNN outperforms strong baselines (without modeling rings) on the molecular property prediction and retrosynthesis prediction tasks.
Researcher Affiliation Collaboration 1University of Science and Technology of China, 2Microsoft Research AI4Science 3Gaoling School of Artificial Intelligence, Renmin University of China
Pseudocode No The paper describes its model architecture and update rules using mathematical equations and descriptive text, but it does not include a structured pseudocode or algorithm block.
Open Source Code Yes The code is released at https://github.com/O-GNN/O-GNN.
Open Datasets Yes The HOMO-LUMO energy gap prediction of the PCQM4Mv1 dataset (Hu et al., 2021). Molecular property prediction on Molecule Net dataset (Wu et al., 2018). Few-shot molecular property prediction of the FS-Mol dataset (Stanley et al., 2021). We work on the inductive setting of the Drug Bank dataset (Wishart et al., 2018). We conduct experiments on the USPTO-50k dataset (Coley et al., 2017).
Dataset Splits Yes PCQM4Mv1 has 3045360 and 380670 training and validation data (test labels are not available). The training, validation and test sets are provided by Deep Chem. Following Chen & Jung (2021), we partition the dataset as 45k training set, 5k validation set and 5k test set.
Hardware Specification No The paper mentions training on 'one GPU' but does not specify any specific model or type of GPU, CPU, or other hardware components used for running experiments.
Software Dependencies No The paper mentions using 'Adam W' as the optimizer and 'RDKit' for retrosynthesis, but it does not provide specific version numbers for these or any other software dependencies such as Python, PyTorch, or TensorFlow.
Experiment Setup Yes For PCQM4Mv1, we set the number of layers as 12 and hidden dimension as 256, which is selected by the cross-validation method on the training set. For FS-Mol, the number of layers are 6 and the hidden dimension is 256. The candidate number of layers and hidden dimensions for Molecule Net are {4, 6, 8, 12} and {128, 256}. On FS-Mol and Moleculet Net, the hyper-parameters are selected according to validation performance. We train all these tasks on one GPU. The optimizer is Adam W (Loshchilov & Hutter, 2019). More detailed parameters are summarized in Table 5 of Appendix A. (Appendix A provides specific values for Number of Layers, Hidden dimension, Optimizer, Dropout, Learning rate, Training steps, Batch size, Weight Decay, Learning Rate Decay across different tasks).