Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Reconstructing Heterogeneous Biomolecules via Hierarchical Gaussian Mixtures and Part Discovery

Authors: Shayan Shekarforoush, David B. Lindell, Marcus A Brubaker, David J Fleet

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We quantitatively compare Cryo SPIRE with the state-of-the-art methods, namely, RECOVAR [10], Cryo DRGN [47], DRGN-AI [19], 3DFlex [33] and 3DVA [32] using the Cryo Bench benchmark [15]. We also provide qualitative results on experimental datasets. ... Quantitative comparison on the three relevant Cryo Bench [15] datasets are provided in Table 2 and Fig. 3. ... Here, we ablate key design decisions in our framework.
Researcher Affiliation	Academia	1University of Toronto 2Vector Institute 3York University EMAIL EMAIL
Pseudocode	No	The paper describes the methods and processes in paragraph form and through figures, but does not present any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	Question: Does the paper provide open access to the data and code, with sufﬁcient instruc543tions to faithfully reproduce the main experimental results, as described in supplemental 544material? Answer: [No] Justiﬁcation: The datasets are publicly available and we promise to release the code in 547future.
Open Datasets	Yes	Cryo Bench. The sole benchmark for cryo-EM heterogeneity is Cryo Bench [15], a set of synthetic datasets with ground-truth labels and a protocol for quantitative evaluation. Two datasets, Ig G-1D and Ig G-RL, are based on the human immunoglobulin G (Ig G) complex... Ribosembly simulates compositional heterogeneity... Experimental Datasets. We also evaluate on two real datasets: EMPIAR-10076 is a dataset comprising assemblies of intermediates of the Escherichia coli large ribosomal subunit (LSU) [7]... We also consider EMPIAR-10180, a conformationally heterogeneous dataset of Pre-Catalytic Spliceosome [30].
Dataset Splits	No	The paper describes the datasets used (e.g., Ig G-1D, Ig G-RL, Ribosembly, EMPIAR-10076, EMPIAR-10180) and their characteristics, such as the number of particle images or states. However, it does not explicitly provide details about specific training, validation, or test splits, or the methodology for creating them for reproducibility beyond general descriptions of the datasets.
Hardware Specification	Yes	The optimization runs on a single NVIDIA Ge Force RTX 2080, taking 3 to 6 hours depending on the number of Gaussians in the model.
Software Dependencies	No	The paper does not explicitly list specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions) that are needed to replicate the experiment.
Experiment Setup	Yes	Implementation Details. For part discovery, we seed G = 2,048 components using the rigid reconstruction and adopt lightweight MLPs with a single hidden layer of H =32 units. The latent space, Z, has dimensionality D=4 and the feature space, F, has dimensionality E=24. We optimize the part discovery model for 15 and 50 epochs on synthetic and experimental datasets. The part-aware GMMs are optimized for 30 epochs, using G=8,192 components, except for Ribosome synthetic and experimental datasets with G=16,384, and have MLPs with three hidden layers and H =128 hidden units. On experimental datasets, we perform part discovery on downsampled 128 128 images for efficiency, while the part-aware GMM is optimized on 256 256 images. We use batch size B=64 and set hyperparameters λz = 0.1, λf = 0.01.