Self-Supervised Graph Transformer on Large-Scale Molecular Data

Authors: Yu Rong, Yatao Bian, Tingyang Xu, Weiyang Xie, Ying WEI, Wenbing Huang, Junzhou Huang

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We pre-train GROVER with 100 million parameters on 10 million unlabelled molecules the biggest GNN and the largest training dataset in molecular representation learning. We then leverage the pre-trained GROVER for molecular property prediction followed by task-specific fine-tuning, where we observe a huge improvement (more than 6% on average) from current state-of-the-art methods on 11 challenging benchmarks.
Researcher Affiliation Collaboration 1Tencent AI Lab 2 Beijing National Research Center for Information Science and Technology (BNRist), Department of Computer Science and Technology, Tsinghua University
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described. It does not contain a specific repository link, explicit code release statement, or mention of code in supplementary materials.
Open Datasets Yes We collect 11 million (M) unlabelled molecules sampled from ZINC15 [48] and Chembl [11] datasets to pre-train GROVER... All datasets can be downloaded from http://moleculenet.ai/datasets-1
Dataset Splits Yes We randomly split 10% of unlabelled molecules as the validation sets for model selection... We adopt the scaffold splitting method with a ratio for train/validation/test as 8:1:1.
Hardware Specification Yes We use 250 Nvidia V100 GPUs to pre-train GROVERbase and GROVERlarge.
Software Dependencies No The paper mentions software like Adam optimizer, Noam learning rate scheduler, and RDKit, but does not provide specific version numbers for these or other key software components.
Experiment Setup Yes We use Adam optimizer for both pre-train and fine-tuning. The Noam learning rate scheduler [9] is adopted to adjust the learning rate during training... For the contextual property prediction task, we set the context radius k = 1... For each molecular graph, we randomly mask 15% of node and edge labels for prediction... For each training process, we train models for 100 epochs. For hyper-parameters, we perform the random search on the validation set for each dataset and report the best results.