Energy-Motivated Equivariant Pretraining for 3D Molecular Graphs

Authors: Rui Jiao, Jiaqi Han, Wenbing Huang, Yu Rong, Yang Liu

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our model pretrained from a large-scale 3D dataset GEOM-QM9 on two challenging 3D benchmarks: MD17 and QM9. Experimental results demonstrate the efficacy of our method against current state-of-the-art pretraining approaches, and verify the validity of our design for each proposed component.
Researcher Affiliation Collaboration 1 Beijing National Research Center for Information Science and Technology (BNRist), Department of Computer Science and Technology, Tsinghua University 2 Institute for AI Industry Research (AIR), Tsinghua University 3 Beijing Academy of Artificial Intelligence 4 Gaoling School of Artificial Intelligence, Renmin University of China 5 Beijing Key Laboratory of Big Data Management and Analysis Methods 6 Tencent AI Lab
Pseudocode No The paper describes its methods using mathematical formulations and descriptive text but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/jiaor17/3D-EMGP.
Open Datasets Yes Pretraining dataset We leverage a large-scale molecular dataset GEOM-QM9 (Axelrod and Gomez-Bombarelli 2022) with corresponding 3D conformations as our pretraining dataset. [...] Downstream tasks To thoroughly evaluate our proposed pretraining framework, we employ the two widely-adopted 3D molecular property prediction datasets: MD17 (Chmiela et al. 2017) and QM9 (Ramakrishnan et al. 2014), as the downstream tasks.
Dataset Splits Yes In detail, MD17 contains the simulated dynamical trajectories of 8 small organic molecules, with the recorded energy and force at each frame. We select 9,500/500 frames as the training/validation set of each molecule. [...] We follow the data split in Anderson, Hy, and Kondor (2019) and Satorras, Hoogeboom, and Welling (2021), where the sizes of training, validation, and test sets are 100k, 18k, and 13k, respectively.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models or memory used for running the experiments.
Software Dependencies No The paper does not provide specific software dependency versions (e.g., library names with version numbers like Python 3.8, CPLEX 12.4).
Experiment Setup No The paper mentions data splits and backbone models but does not explicitly provide concrete hyperparameter values, training configurations, or system-level settings for the experimental setup in the main text.