Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Diffusion Generative Modeling on Lie Group Representations
Authors: Marco Bertolini, Tuan Anh Le, Djork-Arné Clevert
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our approach through experiments on diverse data types, demonstrating its effectiveness in real-world applications such as SO(3)-guided molecular conformer generation and modeling ligand-specific global SE(3) transformations for molecular docking, showing improvement in comparison to Riemannian diffusion on the group itself. |
| Researcher Affiliation | Industry | Marco Bertolini , Tuan Le & Djork-Arné Clevert Machine Learning Research Pfizer Worldwide Research and Development Friedrichstraße 110, 10117 Berlin, Germany EMAIL |
| Pseudocode | Yes | We describe training and sampling procedures in Algorithms 1 and 2 in Appendix E. |
| Open Source Code | Yes | Code Availability Our source code will be made available on https://github.com/pfizer-opensource/symmetry-inducedscore-matching. |
| Open Datasets | Yes | QM9 dataset (Ramakrishnan et al., 2014). We only keep the lowest energy conformer as provided in the original dataset |
| Dataset Splits | Yes | The trained classifier achieves greater than 99% accuracy on the MNIST test set, providing a reliable metric for evaluating reconstruction quality. ... The model is trained using Adam optimizer (lr=0.001), crossentropy loss, batch size 64, for 10 epochs on the standard MNIST training set (60,000 samples). |
| Hardware Specification | No | No specific hardware details (GPU/CPU models, processor types, memory amounts, or cloud instance types) are provided in the paper. |
| Software Dependencies | No | The paper mentions 'RDKit' and 'Python' but does not specify their version numbers or any other software dependencies with version numbers. |
| Experiment Setup | Yes | We trained the model with T = 100 time-steps, but for sampling it suffices to set T = 10. ... We use L = 5 message passing layers with sdim = 128 , vdim = 64 scalar and vector features, respectively. ... We use the cosine scheduler proposed by Dhariwal & Nichol (2021) and T = 100 diffusion timesteps. |