Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

A Simple and Comprehensive Benchmark for Single-Cell Transcriptomics

Authors: Jiaxin Qi, Yan Cui, Kailei Guo, Xiaomin Zhang, Jianqiang Huang, Gaogang Xie

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments, we clarify the misunderstandings in the traditional methods and provide competitive baselines, thereby paving the way for future research in this field.
Researcher Affiliation	Academia	1Computer Network Information Center, Chinese Academy of Sciences, Beijing, China 2Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, China 3University of Chinese Academy of Sciences, Beijing, China 4Tianjin Medical University Eye Hospital, Tianjin, China EMAIL, EMAIL, EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes methods and adaptations for MLP and CNN using mathematical formulations and illustrative figures (Figure 2), but it does not contain explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code https://github.com/simpleshinobu/scbenchmark
Open Datasets	Yes	Pre-training Dataset. We follow the approach proposed by sc GPT (Cui et al. 2024) to assemble a pre-training transcriptomic dataset, containing 54.6 million human cells from the CELLx GENE collection (Biology et al. 2023). This dataset encompasses more than 50 organs (e.g., blood and heart) and tissues across over 400 studies, offering a broad representation of cellular heterogeneity throughout the human body. Expression Classification Datasets. We collect 10 expression classification datasets following the strategies in sc GPT. Myeloid (Myel) (Cheng et al. 2021) performs a comprehensive pan-cancer analysis of myeloid cells, consisting of 13,178 samples and 21 sub-cancer classes. Multiple Sclerosis (MS) (Schirmer et al. 2019) reveals specific cellular changes in multiple sclerosis lesions, which consists of 21,312 samples and 18 cell classes.
Dataset Splits	Yes	For downstream tasks, we split the datasets into 70% training and 30% testing, standardizing the settings to 50 epochs and batch size of 64, with an Adam optimizer and 0.005 learning rate.
Hardware Specification	No	The paper discusses computational burden and efficiency, as well as computational overhead, but does not provide specific details about the hardware used for experiments, such as GPU/CPU models or memory specifications.
Software Dependencies	No	The paper mentions using the Adam optimizer with a learning rate but does not provide specific software dependencies with version numbers, such as programming languages, libraries, or frameworks like PyTorch or TensorFlow.
Experiment Setup	Yes	For pre-training, without specific note, a subset of data was used, with a 6-layer network, hidden dimensions of 256, a batch size of 128, and a gene length of 512. The Adam (Kingma 2014) optimizer with a learning rate of 0.0002 was employed over 10 epochs. For downstream tasks, we split the datasets into 70% training and 30% testing, standardizing the settings to 50 epochs and batch size of 64, with an Adam optimizer and 0.005 learning rate.