Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

GeoAda: Efficiently Finetune Geometric Diffusion Models with Equivariant Adapters

Authors: Wanjia Zhao, Jiaqi Han, Siyi Gu, Mingjian Jiang, James Y Zou, Stefano Ermon

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Geo Ada across three categories of additional fine-tuning controls: (1) Frame control during dynamic prediction ( 5.1), (2) Global type control in human motion prediction( 5.2), and (3) Subgraph control in molecule generation ( 5.3). We also performed ablation studies on core design choices and present some observations in 5.4.
Researcher Affiliation	Academia	Wanjia Zhao , Jiaqi Han , Siyi Gu, Mingjian Jiang, James Zou, Stefano Ermon Department of Computer Science Stanford University
Pseudocode	No	The paper describes the methodology using textual descriptions and mathematical formulations (e.g., equations 3, 4, 5, 6, 7) but does not provide structured pseudocode or algorithm blocks.
Open Source Code	Yes	Equal contribution. Correspondence to EMAIL. Code is available here.
Open Datasets	Yes	We adopt the CHARGED PARTICLES dataset [17, 28] for particle dynamics simulation. We employ the MD17 [3] dataset, which contains the DFT-simulated molecular dynamics trajectories of 8 small molecules. The CMU Mocap dataset is a commonly used dataset for human pose prediction. We adopt the QM9 [26] and GEOM-Drugs [1] dataset for pretraining a model for molecule generation, use the Cross Docked2020 dataset [7] for finetuning protein-ligand pair generation.
Dataset Splits	Yes	We use 3000 trajectories for training, 2000 for validation, and 2000 for testing. For each molecule, 5000 trajectories are used for training and 1000/1000 for validation and testing. Table 9 and Table 10 provide detailed statistics for pretrain and downstream datasets on Global Type Control, including train, val, and test splits. Following the common setup for Cross Docked2020 [10], we obtain 100k complexes for training and 100 novel complexes for testing.
Hardware Specification	Yes	Appendix 8.1 ("Compute Resources") specifies the use of "4 Nvidia A6000 GPUs," training times for different datasets ("NBody and ETH-UCY take around 12 hours while each MD17 training phase takes about a day"), and that "CPUs were standard intel CPUs."
Software Dependencies	No	The paper mentions using "Adam optimizer" and a "linear noise schedule," and refers to "EGTN" as the backbone model. However, it does not provide specific version numbers for these or other key software components like programming languages (e.g., Python) or deep learning frameworks (e.g., PyTorch, TensorFlow).
Experiment Setup	Yes	We provide the detailed hyper-parameters of Geo Ada in Table 8. We adopt Adam optimizer with betas (0.9, 0.999) and ̖ = 10 8. For all experiments, we use the linear noise schedule per [13] with Βstart = 0.02 and Βend = 0.0001.