Rethinking Tokenizer and Decoder in Masked Graph Modeling for Molecules

Authors: ZHIYUAN LIU, Yaorui Shi, An Zhang, Enzhi Zhang, Kenji Kawaguchi, Xiang Wang, Tat-Seng Chua

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically validate that our method outperforms the existing molecule self-supervised learning methods. Our codes and checkpoints are available at https://github.com/syr-cn/Sim SGT. In this section, we perform experiments to assess the roles of tokenizer and decoder in MGM for molecules. Our experiments follow the transfer learning setting in [12, 9]. We pretrain MGM models on 2 million molecules from ZINC15 [42], and evalute the pretrained models on eight classification datasets in Molecule Net [28]: BBBP, Tox21, Tox Cast, Sider, Clin Tox, MUV, HIV, and Bace.
Researcher Affiliation Academia Zhiyuan Liu Yaorui Shi An Zhang Enzhi Zhang Kenji Kawaguchi Xiang Wang Tat-Seng Chua National University of Singapore, University of Science and Technology of China Hokkaido University
Pseudocode Yes Algorithm 1 Pytorch style pseudocode of Sim SGT
Open Source Code Yes Our codes and checkpoints are available at https://github.com/syr-cn/Sim SGT.
Open Datasets Yes We pretrain MGM models on 2 million molecules from ZINC15 [42], and evalute the pretrained models on eight classification datasets in Molecule Net [28]: BBBP, Tox21, Tox Cast, Sider, Clin Tox, MUV, HIV, and Bace. Following the experimental setting in [45], we pretrain Sim SGT on the 50 thousand molecule samples from the GEOM dataset [46] and We report performances of predicting the quantum chemistry properties of molecules [47].
Dataset Splits Yes These downstream datasets are divided into train/valid/test sets by scaffold split to provide an out-of-distribution evaluation setting. We tune the hyperparameters in the fine-tuning stage using the validation performance.
Hardware Specification Yes We perform experiments on an NVIDIA DGX A100 server.
Software Dependencies No The paper mentions 'Pytorch style pseudocode' and cites 'RDkit [30]' for extracting FGs, but it does not provide specific version numbers for these or other software dependencies like Python, PyTorch, CUDA, or other libraries used in the experiments.
Experiment Setup Yes Table 9b summarizes the hyper-parameters. We use different hyper-parameters given different graph encoders. The architectures of the two graph encoders are borrowed from previous works: GINE [12] and GTS [27]. We use large batch sizes of 1024 and 2048 to speed up pretraining. We do not use dropout during pretraining. During fine-tuning, we 50% dropout in GINE layers and 30% dropout in transformer layers. Table 10b: Hyperparameters and their search spaces. Table 11: Hyperparameters for fine-tuning on the QM datasets.