Mixture of neural fields for heterogeneous reconstruction in cryo-EM

Authors: Axel Levy, Rishwanth Raghu, David Shustin, Adele Peng, Huan Li, Oliver Clarke, Gordon Wetzstein, Ellen Zhong

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We run three experiments to evaluate Hydra. In Section 4.1, we show that our method improves the expressiveness of neural-based methods and can reveal strong compositional heterogeneity fully ab initio. In Section 4.2, we use Hydra to reveal compositional heterogeneity in an experimental dataset containing protein complexes of diverse sizes, in a single run. In Section 4.3, we demonstrate the simultaneous reconstruction of compositional and conformational heterogeneity.
Researcher Affiliation Academia Axel Levy Stanford University axlevy@stanford.edu Rishwanth Raghu Princeton University rraghu@princeton.edu David Shustin Princeton University dshustin@princeton.edu Adele Rui-Yang Peng Princeton University adelep@princeton.edu Huan Li Columbia University hl3170@columbia.edu Oliver Biggs Clarke Columbia University oc2188@cumc.columbia.edu Gordon Wetzstein Stanford University gordon.wetzstein@stanford.edu Ellen D. Zhong Princeton University zhonge@princeton.edu
Pseudocode No The paper describes the model and optimization strategy in text and equations, and includes schematic figures, but does not provide a formal pseudocode or algorithm block.
Open Source Code Yes Webpage: https://hydra.cs.princeton.edu. Also, in the NeurIPS Paper Checklist, section 5 'Open access to data and code', the answer is '[Yes]' with the justification 'https://hydra.cs.princeton.edu/'.
Open Datasets Yes We evaluate Hydra on tomotwin3, a synthetic dataset of 3,000 images emulating a protein sample containing multiple species with static structures. We selected the 6th, 7th, and 8th largest proteins by atomic weight from the Tomo Twin training dataset [48], which correspond to entries 6up6, 6id1, and 4cr2 in the RCSB PDB (Protein Data Bank) [3].
Dataset Splits No The paper describes the total number of images used for synthetic datasets (e.g., '3,000 images in total' for tomotwin3, '15,000 images' for ribosplike) and the use of 'random minibatch of indices' for optimization. However, it does not specify explicit train/validation/test dataset splits by percentage or count for these datasets, nor does it refer to predefined splits with citations for partitioning the data into training, validation, and testing subsets.
Hardware Specification Yes All experiments on tomotwin3 are run on one NVIDIA A100 GPU. Experiments for Hydra were run using 2 NVIDIA V100 GPUs, and experiments for baselines were run using 1 NVIDIA V100 GPU. All training for DRGN-AI and Hydra were carried out on 4 A100 NVIDIA GPUs with 80GB memory.
Software Dependencies Yes We train Cryo DRGN2 v3.3.0 for 30 epochs using an 8-dimensional latent space, an encoder width of 1024, 3 encoder layers, and a decoder width of 1024.
Experiment Setup Yes We use the Adam optimizer [23] without weight decay and with the following learning rates: 0.1 for the scores, 0.01 for the conformations, 0.001 for the poses, and 0.0001 for the weights of the neural networks. We perform HPS on 100,000 images (33 epochs), followed by 100 epochs of SGD pose optimization. We set the batch size to 64 during SGD pose optimization. The score table learning rate is set to 0.01, and σ is set to 0.1.