GeoDiff: A Geometric Diffusion Model for Molecular Conformation Generation
Authors: Minkai Xu, Lantao Yu, Yang Song, Chence Shi, Stefano Ermon, Jian Tang
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct comprehensive experiments on multiple benchmarks, including conformation generation and property prediction tasks. Numerical results show that GEODIFF consistently outperforms existing state-of-the-art machine learning approaches, and by a large margin on the more challenging large molecules. |
| Researcher Affiliation | Academia | Minkai Xu1,2, Lantao Yu3, Yang Song3, Chence Shi1,2, Stefano Ermon3 , Jian Tang1,4,5 1Mila Québec AI Institute, Canada 2Université de Montréal, Canada 3Stanford University, USA 4HEC Montréal, Canada 5CIFAR AI Research Chair |
| Pseudocode | Yes | Algorithm 1 Sampling Algorithm of GEODIFF. Input: the molecular graph G, the learned reverse model ϵθ. Output: the molecular conformation C. 1: Sample CT p(CT ) = N(0, I) 2: for s = T, T 1, , 1 do 3: Shift Cs to zero Co M 4: Compute µθ(Cs, G, s) from ϵθ(Cs, G, s) using equation 4 5: Sample Cs 1 N(Cs 1; µθ(Cs, G, s), σ2 t I) 6: end for 7: return C0 as C |
| Open Source Code | Yes | Code is available at https://github.com/Minkai Xu/Geo Diff. |
| Open Datasets | Yes | Following prior works (Xu et al., 2021a;b), we also use the recent GEOM-QM9 (Ramakrishnan et al., 2014) and GEOM-Drugs (Axelrod & Gomez-Bombarelli, 2020) datasets. |
| Dataset Splits | Yes | For both datasets, the training split consists of 40, 000 molecules with 5 conformations for each, resulting in 200, 000 conformations in total. The valid split share the same size as training split. The test split contains 200 distinct molecules, with 22, 408 conformations for QM9 and 14, 324 ones for Drugs. |
| Hardware Specification | Yes | For the training of GEODIFF, we train the model on a single Tesla V100 GPU with a learning rate of 0.001 until convergence and Adam (Kingma & Welling, 2013) as the optimizer. |
| Software Dependencies | No | The paper mentions using MPNNs but does not provide specific version numbers for any software libraries, frameworks, or dependencies used in the experiments (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | The other hyper-parameters of GEODIFF are summarized in Tab. 4, including highest variance level βT , lowest variance level βT , the variance schedule, number of diffusion timesteps T, radius threshold for determining the neighbor of atoms τ, batch size, and number of training iterations. |