Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Multiscale guidance of protein structure prediction with heterogeneous cryo-EM data

Authors: Rishwanth Raghu, Axel Levy, Gordon Wetzstein, Ellen D. Zhong

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate results on both synthetic and real cryo-EM maps of dynamic biomolecular complexes. We evaluate our method on six biomolecular systems. For two of them, we guide Cryo Boltz with synthetic density maps (STP10 [6] and CH67 antibody [70]) and use real, experimental maps for the other four systems (P-glycoprotein [14], Pma1, CYP102A1, and Ybb AP [56]).
Researcher Affiliation Academia Rishwanth Raghu Princeton University EMAIL Axel Levy Stanford University EMAIL Gordon Wetzstein Stanford University EMAIL Ellen D. Zhong Princeton University EMAIL
Pseudocode No No specific pseudocode block or algorithm steps formatted like code are present in the paper. The methodology is described in prose in Section 3 and visualized conceptually in Figure 2.
Open Source Code No The code will be released upon publication.
Open Datasets Yes We evaluate our method on six biomolecular systems. For two of them, we guide Cryo Boltz with synthetic density maps (STP10 [6] and CH67 antibody [70]) and use real, experimental maps for the other four systems (P-glycoprotein [14], Pma1, CYP102A1, and Ybb AP [56]). For STP10, from the deposited atomic models of these structures (PDB:7AAQ, 7AAR) [4, 5]... For CH67 antibody, PDB:4HKX [71]... For P-glycoprotein, EMD-40226, 40259, 40258, 40227... PDB:8GMG, 8SA1, 8SA0, 8GMJ [15, 18, 17, 16]... Pma1 (EMD-64135, 64136), PDB:9UGB, 9UGC [89, 90]... CYP102A1 (EMD-27534, 27536), PDB:8DME, 8DMG [78, 79]... Ybb AP (EMD-51292, 51291), PDB:9GE7, 9GE6 [58, 57].
Dataset Splits No The paper uses various cryo-EM maps and PDB structures for evaluation, but it does not specify traditional training, validation, and test splits for these datasets within the context of its own methodology development, as it leverages a pretrained diffusion model. It describes its evaluation strategy as 'We sample 25 structures for each of three model replicates.' but this refers to the output generated, not data partitioning.
Hardware Specification Yes All experiments were performed on a single Nvidia A100 GPU with 80 GB VRAM.
Software Dependencies No The paper mentions software such as Chimera X [59] and the Geom Loss library [31], but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes In Appendix A.5 'Guidance schedule', the paper specifies 'For the synthetic datasets, the numbers of steps in the guidance phases are Tw = 125, Tg = 25, Tl = 25, and Tr = 25. For the experimental datasets, the numbers of steps are Tw = 100, Tg = 50, Tl = 25, and Tr = 25. During the global guidance phase (for all datasets), the guidance strength is annealed along a cosine schedule from 0.25 to 0.05... During the local guidance phase, the guidance strength is made constant at λ(t) = 0.5.' Appendix A.6 'Experimental Details' further specifies parameters for each dataset, such as 'step_scale parameter... set at 3.0' for CH67 antibody, and specific thresholding and padding values for P-glycoprotein, Pma1, CYP102A1, and Ybb AP.