Sliced Denoising: A Physics-Informed Molecular Pre-Training Method
Authors: Yuyan Ni, Shikun Feng, Wei-Ying Ma, Zhi-Ming Ma, Yanyan Lan
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 EXPERIMENTS Our first experiment in section 4.1 is concerned with whether our approach achieves better physical consistency, specifically in terms of force field accuracy, as compared to coordinate denoising and fractional denoising methods. Then in section 4.2, we evaluate the performance of Sli De in comparison to state-of-the-art 3D pre-training methods on the benchmark datasets QM9 and MD17, in order to assess our model s ability for molecular property prediction. Furthermore, in section 4.3, we conduct ablation studies concerning fine-tuning regularization and network architecture. |
| Researcher Affiliation | Collaboration | Yuyan Ni1,3 , Shikun Feng 2 , Weiying Ma2, Zhiming Ma1, Yanyan Lan2,4 1Academy of Mathematics and Systems Science, Chinese Academy of Sciences 2Institute for AI Industry Research (AIR), Tsinghua University 3University of Chinese Academy of Sciences 4Beijing Academy of Artificial Intelligence |
| Pseudocode | Yes | C.2 Pseudocode for Algorithms and Complexity analysis and Algorithm 1 Sliced Denoising Pre-training Algorithm |
| Open Source Code | Yes | 1The code is released publicly at https://github.com/fengshikun/SliDe. |
| Open Datasets | Yes | QM9 (Ramakrishnan et al., 2014) is a quantum chemistry dataset providing one equilibrium conformation and 12 labels of geometric, energetic, electronic, and thermodynamic properties for 134k stable small organic molecules made up of CHONF atoms. Our model is pre-trained on PCQM4Mv2 dataset (Nakata & Shimazaki, 2017), which contains 3.4 million organic molecules and provides one equilibrium conformation for each molecule. |
| Dataset Splits | Yes | The data splitting follows standard settings which have a training set with 110,000 samples, a validation set with 10,000 samples, and a test set with the remaining 10,831 samples. |
| Hardware Specification | Yes | Frad pre-training takes 1d1h14m on 8 NVIDIA A100 GPU, and Sli De pre-training takes 1d17h1m on 8 Tesla V100 GPU. |
| Software Dependencies | Yes | Our parameters are obtained from the parameter files of Open Force Field v.2.0.0 (Sage) (Boothroyd et al., 2023). |
| Experiment Setup | Yes | C.4 HYPERPARAMETER SETTINGS and Table 12: Hyperparameters for pre-training., Table 13: Hyperparameters for fine-tuning on MD17., Table 14: Hyperparameters for fine-tuning on QM9. |