reproducibilityindex.ai

Pre-training Molecular Graph Representation with 3D Geometry

Authors: Shengchao Liu, Hanchen Wang, Weiyang Liu, Joan Lasenby, Hongyu Guo, Jian Tang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, comprehensive experiments show that Graph MVP can consistently outperform existing graph SSL methods.
Researcher Affiliation	Academia	Shengchao Liu1,2, Hanchen Wang3, Weiyang Liu3,4, Joan Lasenby3, Hongyu Guo5, Jian Tang1,6,7 1Mila 2Université de Montréal 3University of Cambridge 4MPI for Intelligent Systems, Tübingen 5National Research Council Canada 6HEC Montréal 7CIFAR AI Chair
Pseudocode	No	The paper provides mathematical formulations and descriptions of its methods but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	Code is available on Git Hub. Our code is available on Git Hub for reproducibility.
Open Datasets	Yes	We randomly select 50k qualiﬁed molecules from GEOM [4] with both 2D and 3D structures for the pre-training.
Dataset Splits	Yes	For fine-tuning, we follow the same setting of the main graph SSL work [42, 103, 104], exploring 8 binary molecular property prediction tasks, which are all in the low-data regime. For each downstream task, we report the mean (and standard deviation) ROC-AUC of 3 seeds with scaffold splitting.
Hardware Specification	No	The paper discusses training efficiency and provides training times (e.g., '3h', '50h') for some models, but it does not specify any particular hardware details such as GPU models, CPU types, or memory.
Software Dependencies	No	The paper mentions using 'Graph Isomorphism Network (GIN)' and 'Sch Net' and the use of a 'conda virtual environment' for reproducibility, but it does not specify version numbers for these or other software dependencies.
Experiment Setup	Yes	Graph MVP has two key factors: i) masking ratio (M) and ii) number of conformers for each molecule (C). We set M = 0.15 and C = 5 by default, and will explore their effects in the following ablation studies in Section 4.3. For EBM-NCE loss, we adopt the empirical distribution for noise distribution.