GeoDiff: A Geometric Diffusion Model for Molecular Conformation Generation

Authors: Minkai Xu, Lantao Yu, Yang Song, Chence Shi, Stefano Ermon, Jian Tang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct comprehensive experiments on multiple benchmarks, including conformation generation and property prediction tasks. Numerical results show that GEODIFF consistently outperforms existing state-of-the-art machine learning approaches, and by a large margin on the more challenging large molecules.
Researcher Affiliation Academia Minkai Xu1,2, Lantao Yu3, Yang Song3, Chence Shi1,2, Stefano Ermon3 , Jian Tang1,4,5 1Mila Québec AI Institute, Canada 2Université de Montréal, Canada 3Stanford University, USA 4HEC Montréal, Canada 5CIFAR AI Research Chair
Pseudocode Yes Algorithm 1 Sampling Algorithm of GEODIFF. Input: the molecular graph G, the learned reverse model ϵθ. Output: the molecular conformation C. 1: Sample CT p(CT ) = N(0, I) 2: for s = T, T 1, , 1 do 3: Shift Cs to zero Co M 4: Compute µθ(Cs, G, s) from ϵθ(Cs, G, s) using equation 4 5: Sample Cs 1 N(Cs 1; µθ(Cs, G, s), σ2 t I) 6: end for 7: return C0 as C
Open Source Code Yes Code is available at https://github.com/Minkai Xu/Geo Diff.
Open Datasets Yes Following prior works (Xu et al., 2021a;b), we also use the recent GEOM-QM9 (Ramakrishnan et al., 2014) and GEOM-Drugs (Axelrod & Gomez-Bombarelli, 2020) datasets.
Dataset Splits Yes For both datasets, the training split consists of 40, 000 molecules with 5 conformations for each, resulting in 200, 000 conformations in total. The valid split share the same size as training split. The test split contains 200 distinct molecules, with 22, 408 conformations for QM9 and 14, 324 ones for Drugs.
Hardware Specification Yes For the training of GEODIFF, we train the model on a single Tesla V100 GPU with a learning rate of 0.001 until convergence and Adam (Kingma & Welling, 2013) as the optimizer.
Software Dependencies No The paper mentions using MPNNs but does not provide specific version numbers for any software libraries, frameworks, or dependencies used in the experiments (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes The other hyper-parameters of GEODIFF are summarized in Tab. 4, including highest variance level βT , lowest variance level βT , the variance schedule, number of diffusion timesteps T, radius threshold for determining the neighbor of atoms τ, batch size, and number of training iterations.