Physical Consistency Bridges Heterogeneous Data in Molecular Multi-Task Learning
Authors: Yuxuan Ren, Dihan Zheng, Chang Liu, Peiran Jin, Yu Shi, Lin Huang, Jiyan He, Shengjie Luo, Tao Qin, Tie-Yan Liu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we demonstrate the advantages of incorporating the proposed consistency losses into multi-task learning. Implementation details are provided in Appendix B. |
| Researcher Affiliation | Industry | Microsoft Research AI for Science |
| Pseudocode | Yes | Algorithm B.1 Implementation of Optimality Consistency Loss |
| Open Source Code | No | Releasing data and code requires an internal asset releasing review process in our organization. We have started the process, but cannot guarantee availability during the review period. |
| Open Datasets | Yes | We consider multi-task learning of energy and structure prediction on the Pub Chem QC B3LYP/6-31G*//PM6 dataset [29] (abbreviated as PM6) |
| Dataset Splits | Yes | Each of the evaluation datasets of PCQ and QM9 is spilt into three disjoint sets for training, validation, and test. |
| Hardware Specification | Yes | The multi-task model is trained on an 8 Nvidia V100 GPU server for approximately one week. The model with the consistency loss is trained on one 16 Nvidia V100 GPU server. |
| Software Dependencies | No | The paper mentions 'Adam optimizer' and 'Pytorch' but does not specify version numbers for these software components or any other key libraries. |
| Experiment Setup | Yes | Our pre-training procedure is executed in two discrete stages. Initially, the model is subjected to training exclusively utilizing the multi-task loss function, Lmulti-task, for a total of 300,000 iterations. Subsequently, in the second stage, we integrate the proposed consistency loss and the force loss, Lforce, into the training regimen, which then proceeds for an additional 200,000 iterations. All the models are trained with the Adam optimizer [66] with batch size 256. The learning rate is set to 2 10 4 with a linear warm-up phase in the initial 10,000 steps, which followed by a linear decay schedule thereafter. The weights of the energy loss and the diffusion denoising loss, i.e., the first and the second terms in Eq. (7), are set to 1.0 and 0.01, respectively. The weights of the optimality consistency loss Eq. (10) and the score consistency loss Eq. (11) are set to 0.1 and 1.0, respectively. |