Sliced Denoising: A Physics-Informed Molecular Pre-Training Method

Authors: Yuyan Ni, Shikun Feng, Wei-Ying Ma, Zhi-Ming Ma, Yanyan Lan

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 EXPERIMENTS Our first experiment in section 4.1 is concerned with whether our approach achieves better physical consistency, specifically in terms of force field accuracy, as compared to coordinate denoising and fractional denoising methods. Then in section 4.2, we evaluate the performance of Sli De in comparison to state-of-the-art 3D pre-training methods on the benchmark datasets QM9 and MD17, in order to assess our model s ability for molecular property prediction. Furthermore, in section 4.3, we conduct ablation studies concerning fine-tuning regularization and network architecture.
Researcher Affiliation Collaboration Yuyan Ni1,3 , Shikun Feng 2 , Weiying Ma2, Zhiming Ma1, Yanyan Lan2,4 1Academy of Mathematics and Systems Science, Chinese Academy of Sciences 2Institute for AI Industry Research (AIR), Tsinghua University 3University of Chinese Academy of Sciences 4Beijing Academy of Artificial Intelligence
Pseudocode Yes C.2 Pseudocode for Algorithms and Complexity analysis and Algorithm 1 Sliced Denoising Pre-training Algorithm
Open Source Code Yes 1The code is released publicly at https://github.com/fengshikun/SliDe.
Open Datasets Yes QM9 (Ramakrishnan et al., 2014) is a quantum chemistry dataset providing one equilibrium conformation and 12 labels of geometric, energetic, electronic, and thermodynamic properties for 134k stable small organic molecules made up of CHONF atoms. Our model is pre-trained on PCQM4Mv2 dataset (Nakata & Shimazaki, 2017), which contains 3.4 million organic molecules and provides one equilibrium conformation for each molecule.
Dataset Splits Yes The data splitting follows standard settings which have a training set with 110,000 samples, a validation set with 10,000 samples, and a test set with the remaining 10,831 samples.
Hardware Specification Yes Frad pre-training takes 1d1h14m on 8 NVIDIA A100 GPU, and Sli De pre-training takes 1d17h1m on 8 Tesla V100 GPU.
Software Dependencies Yes Our parameters are obtained from the parameter files of Open Force Field v.2.0.0 (Sage) (Boothroyd et al., 2023).
Experiment Setup Yes C.4 HYPERPARAMETER SETTINGS and Table 12: Hyperparameters for pre-training., Table 13: Hyperparameters for fine-tuning on MD17., Table 14: Hyperparameters for fine-tuning on QM9.