reproducibilityindex.ai

CellPLM: Pre-training of Cell Language Model Beyond Single Cells

Authors: Hongzhi Wen, Wenzhuo Tang, Xinnan Dai, Jiayuan Ding, Wei Jin, Yuying Xie, Jiliang Tang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	It is evident from our experiments that Cell PLM consistently outperforms both pre-trained and non-pre-trained methods across five distinct downstream tasks, with 100 times higher inference speed on generating cell embeddings compared to existing pre-trained models.
Researcher Affiliation	Academia	1Michigan State University 2Emory University
Pseudocode	No	The paper describes methods in prose and with diagrams (e.g., Figure 2) but does not contain a dedicated pseudocode or algorithm block.
Open Source Code	Yes	The checkpoint of our pre-trained is released on our Github1 repository, as well as the source codes for fine-tuning and zero-shot experiments. 1Github link of Cell PLM: https://github.com/Omics ML/Cell PLM
Open Datasets	Yes	All the data we used in this study are publicly available data. The data sources are specified in the appendix. 10x genomics datasets. https://support.10xgenomics.com/ single-cellgene-expression/datasets, a.
Dataset Splits	Yes	Additionally, for methods require model selection on validation set, we performed another 10% simulation dropout and treat masked entries as validation set.
Hardware Specification	Yes	the pre-training was finished in less than 24 hours on a GPU server with 8 Nvidia Tesla v100 16GB cards. Table 1: Inference time(s) for querying 48, 082 cells on an A100 40GB GPU.
Software Dependencies	No	We used inner join by default of Anndata package. We implemented Deep Impute with default settings in DANCE Ding et al. (2022) package. We utilized R package SAVER to illustrate the performance of it. The paper mentions software packages but does not provide specific version numbers for any of them.
Experiment Setup	Yes	The hyperparameters, datasets, and reproducibility information for pre-trained models are detailed in Appendix E. Table 5: Hyperparameters for pretraining Cell PLM model.