Exploring Molecular Pretraining Model at Scale
Authors: xiaohong ji, Zhen Wang, Zhifeng Gao, Hang Zheng, Linfeng Zhang, Guolin Ke, Weinan E
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we present Uni-Mol2 , an innovative molecular pretraining model that leverages a two-track transformer to effectively integrate features at the atomic level, graph level, and geometry structure level. Along with this, we systematically investigate the scaling law within molecular pretraining models, characterizing the power-law correlations between validation loss and model size, dataset size, and computational resources. Consequently, we successfully scale Uni-Mol2 to 1.1 billion parameters through pretraining on 800 million conformations, making it the largest molecular pretraining model to date. Extensive experiments show consistent improvement in the downstream tasks as the model size grows. The Uni-Mol2 with 1.1B parameters also outperforms existing methods, achieving an average 27% improvement on the QM9 and 14% on COMPAS-1D dataset. |
| Researcher Affiliation | Collaboration | Xiaohong Ji1 , Zhen Wang1 , Zhifeng Gao1 , Hang Zheng1 Linfeng Zhang1,2, Guolin Ke1 , Weinan E2,3,4 1DP Technology, Beijing, 100080, China. 2AI for Science Institute, Beijing 100080, China. 3School of Mathematical Sciences, Peking University, Beijing, 100871, China. 4Center for Machine Learning Research, Peking University, Beijing 100084, China. |
| Pseudocode | No | The paper describes the architecture and pretraining tasks but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The code, model, and data are made publicly available upon acceptance. |
| Open Datasets | Yes | One part consists of approximately 19 million molecules sourced from Uni-Mol [11], while the other is derived from ZINC20 [31] which includes 1.4 billion compounds. We downloaded the subset with standard reactivity, which contains 884 million compounds from website 2. Table 1 shows the enrichment compared with Uni-Mol dataset. |
| Dataset Splits | Yes | To prevent data leakage in evaluating pretraining performance, we randomly sampled 520k molecules from the Uni-Mol2 dataset as the validation set to evaluate the effectiveness and investigate the scaling relationship. |
| Hardware Specification | Yes | For models containing parameters ranging from 42M to 310M, we employed 32 NVIDIA A100 GPU cards, while for models with 570M and 1.1B parameters, we utilized 64 NVIDIA A100 GPU cards. We utilized a computational cluster comprising 64 NVIDIA A100 GPUs, each equipped with 80GB of HBM2 memory. |
| Software Dependencies | No | The paper mentions software like Adam W optimizer, mix-precision, RDKit, Uni-Core, and PyTorch, but it does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | We study the scalability of Uni-Mol with the scale from 42M to 1.1B, and all the parameters for Uni-Mol2 at different scales are listed in Table 2. And Uni-Mol2 is trained with Adam W optimzer[37, 38], with the following hyper-parameters: β1 = 0.9 and β2 = 0.99 and weight decay 1e 4. The gradient clip norm is set to 1.0 for training stability. The learning rate scheduler employed is a polynomial decay scheduler during pretraining. Specifically, all models reach its maximum learning rate value 1e 4 after 100,000 warm-up steps and decay the learning rate of each parameter group using a polynomial function with power 1.0. All the models are trained with mix-precision[39] for training efficiency. In line with previous methods, we employ grid search to find the optimal hyper-parameters for tasks within the QM9 and COMPAS-1D datasets. The specific hyper-parameters are detailed in Table 9. |