Physical Consistency Bridges Heterogeneous Data in Molecular Multi-Task Learning

Authors: Yuxuan Ren, Dihan Zheng, Chang Liu, Peiran Jin, Yu Shi, Lin Huang, Jiyan He, Shengjie Luo, Tao Qin, Tie-Yan Liu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we demonstrate the advantages of incorporating the proposed consistency losses into multi-task learning. Implementation details are provided in Appendix B.
Researcher Affiliation Industry Microsoft Research AI for Science
Pseudocode Yes Algorithm B.1 Implementation of Optimality Consistency Loss
Open Source Code No Releasing data and code requires an internal asset releasing review process in our organization. We have started the process, but cannot guarantee availability during the review period.
Open Datasets Yes We consider multi-task learning of energy and structure prediction on the Pub Chem QC B3LYP/6-31G*//PM6 dataset [29] (abbreviated as PM6)
Dataset Splits Yes Each of the evaluation datasets of PCQ and QM9 is spilt into three disjoint sets for training, validation, and test.
Hardware Specification Yes The multi-task model is trained on an 8 Nvidia V100 GPU server for approximately one week. The model with the consistency loss is trained on one 16 Nvidia V100 GPU server.
Software Dependencies No The paper mentions 'Adam optimizer' and 'Pytorch' but does not specify version numbers for these software components or any other key libraries.
Experiment Setup Yes Our pre-training procedure is executed in two discrete stages. Initially, the model is subjected to training exclusively utilizing the multi-task loss function, Lmulti-task, for a total of 300,000 iterations. Subsequently, in the second stage, we integrate the proposed consistency loss and the force loss, Lforce, into the training regimen, which then proceeds for an additional 200,000 iterations. All the models are trained with the Adam optimizer [66] with batch size 256. The learning rate is set to 2 10 4 with a linear warm-up phase in the initial 10,000 steps, which followed by a linear decay schedule thereafter. The weights of the energy loss and the diffusion denoising loss, i.e., the first and the second terms in Eq. (7), are set to 1.0 and 0.01, respectively. The weights of the optimality consistency loss Eq. (10) and the score consistency loss Eq. (11) are set to 0.1 and 1.0, respectively.