Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

DRACO: A Denoising-Reconstruction Autoencoder for Cryo-EM

Authors: YingJun Shen, Haizhao Dai, Qihe Chen, Yan Zeng, Jiakai Zhang, Yuan Pei, Jingyi Yu

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	DRACO demonstrates the best performance in denoising, micrograph curation, and particle picking tasks compared to state-of-the-art baselines.
Researcher Affiliation	Collaboration	1School of Information Science and Technology, Shanghai Tech University. 2Cellverse Co, Ltd. 3i Human Institute, Shanghai Tech University.
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	All code, pre-trained model weights, and datasets will be made publicly available for further research and model development. (Also confirmed by NeurIPS checklist: 'We do not provide code and data in submission. However, the code and data will be released upon acceptance.')
Open Datasets	Yes	Direct access to the public database leads to varying data quality, inconsistent data formats, or missing annotations. Therefore, we construct a large-scale, high-quality, and diverse single-particle cryo-EM image dataset by curating and manually processing 529 sets of data from EMPIAR [17], obtaining over 270,000 cryo-EM movies or micrographs in total.
Dataset Splits	Yes	We divided these micrographs into training and evaluation datasets using an 80%/20% split ratio.
Hardware Specification	Yes	The warm-up stage takes 6 hours, and the pre-training stage takes 16 hours on a GPU cluster with 64 NVIDIA A800 GPUs, requiring approximately 80 GB of memory for a batch size of 4096.
Software Dependencies	No	The paper mentions software like cryo SPARC and Detectron2, but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	The decoder of DRACO uses 8 Transformer blocks with embedding dimension 512, followed by a three-layer convolution neck and a linear projection layer with an output dimension 16 16, which is also the patch size of the input. The mask ratio for the one input micrograph is 0.75 by default. ... We warm up DRACO ... for 200 epochs. Then we adopt our novel denoising-reconstruction pre-training for 400 epochs.