Pre-training Molecular Graph Representation with 3D Geometry
Authors: Shengchao Liu, Hanchen Wang, Weiyang Liu, Joan Lasenby, Hongyu Guo, Jian Tang
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, comprehensive experiments show that Graph MVP can consistently outperform existing graph SSL methods. |
| Researcher Affiliation | Academia | Shengchao Liu1,2, Hanchen Wang3, Weiyang Liu3,4, Joan Lasenby3, Hongyu Guo5, Jian Tang1,6,7 1Mila 2Université de Montréal 3University of Cambridge 4MPI for Intelligent Systems, Tübingen 5National Research Council Canada 6HEC Montréal 7CIFAR AI Chair |
| Pseudocode | No | The paper provides mathematical formulations and descriptions of its methods but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Code is available on Git Hub. Our code is available on Git Hub for reproducibility. |
| Open Datasets | Yes | We randomly select 50k qualified molecules from GEOM [4] with both 2D and 3D structures for the pre-training. |
| Dataset Splits | Yes | For fine-tuning, we follow the same setting of the main graph SSL work [42, 103, 104], exploring 8 binary molecular property prediction tasks, which are all in the low-data regime. For each downstream task, we report the mean (and standard deviation) ROC-AUC of 3 seeds with scaffold splitting. |
| Hardware Specification | No | The paper discusses training efficiency and provides training times (e.g., '3h', '50h') for some models, but it does not specify any particular hardware details such as GPU models, CPU types, or memory. |
| Software Dependencies | No | The paper mentions using 'Graph Isomorphism Network (GIN)' and 'Sch Net' and the use of a 'conda virtual environment' for reproducibility, but it does not specify version numbers for these or other software dependencies. |
| Experiment Setup | Yes | Graph MVP has two key factors: i) masking ratio (M) and ii) number of conformers for each molecule (C). We set M = 0.15 and C = 5 by default, and will explore their effects in the following ablation studies in Section 4.3. For EBM-NCE loss, we adopt the empirical distribution for noise distribution. |