SubgDiff: A Subgraph Diffusion Model to Improve Molecular Representation Learning

Authors: JIYING ZHANG, Zijing Liu, Yu Wang, Bin Feng, Yu Li

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on extensive downstream tasks, especially the molecular force predictions, demonstrate the superior performance of our approach. We conduct experiments to address the following two questions: 1) Can substructures improve the representation ability of the denoising network when using diffusion as self-supervised learning? 2) How does the proposed subgraph diffusion affect the generative ability of the diffusion models?
Researcher Affiliation Industry Jiying Zhang, Zijing Liu , Yu Wang, Bin Feng, Yu Li International Digital Economy Academy (IDEA) {zhangjiying,liuzijing,fengbin,liyu}@idea.edu.cn
Pseudocode Yes Algorithm 1: Training Subg Diff Input: A molecular graph G3D, k for same mask diffusion, m := (t 1)/k Sample t U(1, ..., T) , ϵ N(0, I) Sample skm+1 pskm+1(S | G) Sample a subgraph Rt q(Rt|R0) Equation 17 L1 = BCE(skm+1, sϑ(G, Rt, t)) Subgraph prediction loss L2 = diag(skm+1)(ϵ ϵθ(G, Rt, t)) 2 Denoising loss optimizer. step(Et,R0,st,ϵ[λL1 + L2]) Optimize parameters θ, ϑ. Algorithm 2: Sampling from Subg Diff k is the same as training, for k-step same-subgraph diffusion; Sample RT N(0, I) Random noise initialization for t = T to 1 do z N(0, I) if t > 1, else z = 0 Random noise If t%k == 0 or t == T: ˆs sϑ(G, Rt, t) Subgraph prediction ˆϵ ϵθ(G, Rt, t) Posterior Rt 1 Equation 19 sampling end return R0
Open Source Code Yes https://github.com/IDEA-XL/Subg Diff
Open Datasets Yes For pretraining, we follow [23] and use PCQM4Mv2 dataset [12]. It s a subdataset of Pub Chem QC [29] with 3.4 million molecules with 3D geometric conformations. We use various molecular property prediction datasets as downstream tasks. For tasks with 3D conformations, we consider the dataset MD17 and follow the literature [35, 36, 24]... For downstream tasks with only 2D molecule graphs, we use eight molecular property prediction tasks from Molecule Net [48].
Dataset Splits Yes For tasks with 3D conformations, we consider the dataset MD17 and follow the literature [35, 36, 24] of using 1K for training and 1K for validation, while the test set (from 48K to 991K) is much larger. By adopting the pertaining setting in Appendix A.4.2, we also take the QM9 dataset for finetuning and follow the literature [35, 36, 23], using 110K for training, 10K for validation and 11k for testing.
Hardware Specification Yes Pre-training takes around 24 hours with a single Nvidia A6000 GPU of 48GB RAM.
Software Dependencies No The paper mentions software components and frameworks such as GIN, SchNet, PyTorch, and CUDA, but it does not specify their exact version numbers within the main text or the detailed experimental setup sections. The mention of versions in the NeurIPS checklist justification is external to the paper's main content.
Experiment Setup Yes A.3 Hyperparameters. All models are trained with SGD using the ADAM optimizer. Pre-training takes around 24 hours with a single Nvidia A6000 GPU of 48GB RAM. The hyperparameters can be seen in Table 6 and Table 7. Table 6: Additional hyperparameters of our Subg Diff. Task β1 βT β scheduler T k (k-same mask) τ Batch Size Train Iter. QM9 1e-7 2e-3 sigmoid 5000 250 10Å 64 2M Drugs 1e-7 2e-3 sigmoid 5000 250 10Å 32 6M. Table 7: Additional hyperparameters of our Subg Diff with different timesteps.