Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Reconstructing Heterogeneous Biomolecules via Hierarchical Gaussian Mixtures and Part Discovery
Authors: Shayan Shekarforoush, David B. Lindell, Marcus A Brubaker, David J Fleet
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We quantitatively compare Cryo SPIRE with the state-of-the-art methods, namely, RECOVAR [10], Cryo DRGN [47], DRGN-AI [19], 3DFlex [33] and 3DVA [32] using the Cryo Bench benchmark [15]. We also provide qualitative results on experimental datasets. ... Quantitative comparison on the three relevant Cryo Bench [15] datasets are provided in Table 2 and Fig. 3. ... Here, we ablate key design decisions in our framework. |
| Researcher Affiliation | Academia | 1University of Toronto 2Vector Institute 3York University EMAIL EMAIL |
| Pseudocode | No | The paper describes the methods and processes in paragraph form and through figures, but does not present any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | Question: Does the paper provide open access to the data and code, with sufficient instruc543tions to faithfully reproduce the main experimental results, as described in supplemental 544material? Answer: [No] Justification: The datasets are publicly available and we promise to release the code in 547future. |
| Open Datasets | Yes | Cryo Bench. The sole benchmark for cryo-EM heterogeneity is Cryo Bench [15], a set of synthetic datasets with ground-truth labels and a protocol for quantitative evaluation. Two datasets, Ig G-1D and Ig G-RL, are based on the human immunoglobulin G (Ig G) complex... Ribosembly simulates compositional heterogeneity... Experimental Datasets. We also evaluate on two real datasets: EMPIAR-10076 is a dataset comprising assemblies of intermediates of the Escherichia coli large ribosomal subunit (LSU) [7]... We also consider EMPIAR-10180, a conformationally heterogeneous dataset of Pre-Catalytic Spliceosome [30]. |
| Dataset Splits | No | The paper describes the datasets used (e.g., Ig G-1D, Ig G-RL, Ribosembly, EMPIAR-10076, EMPIAR-10180) and their characteristics, such as the number of particle images or states. However, it does not explicitly provide details about specific training, validation, or test splits, or the methodology for creating them for reproducibility beyond general descriptions of the datasets. |
| Hardware Specification | Yes | The optimization runs on a single NVIDIA Ge Force RTX 2080, taking 3 to 6 hours depending on the number of Gaussians in the model. |
| Software Dependencies | No | The paper does not explicitly list specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions) that are needed to replicate the experiment. |
| Experiment Setup | Yes | Implementation Details. For part discovery, we seed G = 2,048 components using the rigid reconstruction and adopt lightweight MLPs with a single hidden layer of H =32 units. The latent space, Z, has dimensionality D=4 and the feature space, F, has dimensionality E=24. We optimize the part discovery model for 15 and 50 epochs on synthetic and experimental datasets. The part-aware GMMs are optimized for 30 epochs, using G=8,192 components, except for Ribosome synthetic and experimental datasets with G=16,384, and have MLPs with three hidden layers and H =128 hidden units. On experimental datasets, we perform part discovery on downsampled 128 128 images for efficiency, while the part-aware GMM is optimized on 256 256 images. We use batch size B=64 and set hyperparameters λz = 0.1, λf = 0.01. |