Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Equivariant Blurring Diffusion for Hierarchical Molecular Conformer Generation
Authors: Jiwoong Park, Yang Shen
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we evaluate our hierarchical molecular conformer generation framework via Equivariant Blurring Diffusion (EBD) on molecular conformer generation task. We conducted experiments to answer the following questions: i) Ablation studies (Sec. 5.2): What are the effects of granularity of the fragment vocabulary, loss reparameterization, and data corruption processes of diffusion models? ii) Geometric evaluation (Sec. 5.3): Can EBD generate more diverse and accurate molecular conformers in Euclidean space than previous deep generative approaches? iii) Property prediction (Sec. 5.4): Can EBD generate low-energy, stable conformers? |
| Researcher Affiliation | Academia | Jiwoong Park, Yang Shen Department of Electrical and Computer Engineering Texas A&M University EMAIL, EMAIL |
| Pseudocode | Yes | In this subsection, we provide the Pytorch-style [37] pseudo-codes. The RDKit conformer generator to obtain the approximate fragment structure, linear interpolation blurring schedule, training process, and sampling process were given in Pseudo-codes 1, 2, 3, and 4, respectively. |
| Open Source Code | Yes | Codes are released at https://github.com/Shen-Lab/EBD. |
| Open Datasets | Yes | We use GEOM-QM9 (QM9) [39] and GEOM-Drugs (Drugs) [1] which are small molecules and drug-like molecules, respectively. We obtained the raw data, the pre-processed data and the data split at https://github.com/Deep Graph Learning/Conf GF. |
| Dataset Splits | Yes | Each dataset comprises 40,000 molecules for the training set and 5,000 molecules for the validation set, with each molecule containing 5 conformers following data split of [46]. |
| Hardware Specification | Yes | We used a single NVIDIA A100 GPU for every training and generation tasks. |
| Software Dependencies | No | The paper mentions using PyTorch and RDKit but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | For EBD, we use the T = 50, a noise scale of 0.01 for the forward process (σ in Eq. 6) and 0.0125 for the reverse process (δ in Eq. 8) in every experiments. For training, we used a learning rate 10 4 with the Adam W optimizer [33]. Table 7: Hyperparameters of EBD. Dataset T # l # d # of hops cutoff batch size training iter. Drugs 50 6 128 3 10 Å 32 650k QM9 50 6 128 3 10 Å 64 650k |