CellPLM: Pre-training of Cell Language Model Beyond Single Cells
Authors: Hongzhi Wen, Wenzhuo Tang, Xinnan Dai, Jiayuan Ding, Wei Jin, Yuying Xie, Jiliang Tang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | It is evident from our experiments that Cell PLM consistently outperforms both pre-trained and non-pre-trained methods across five distinct downstream tasks, with 100 times higher inference speed on generating cell embeddings compared to existing pre-trained models. |
| Researcher Affiliation | Academia | 1Michigan State University 2Emory University |
| Pseudocode | No | The paper describes methods in prose and with diagrams (e.g., Figure 2) but does not contain a dedicated pseudocode or algorithm block. |
| Open Source Code | Yes | The checkpoint of our pre-trained is released on our Github1 repository, as well as the source codes for fine-tuning and zero-shot experiments. 1Github link of Cell PLM: https://github.com/Omics ML/Cell PLM |
| Open Datasets | Yes | All the data we used in this study are publicly available data. The data sources are specified in the appendix. 10x genomics datasets. https://support.10xgenomics.com/ single-cellgene-expression/datasets, a. |
| Dataset Splits | Yes | Additionally, for methods require model selection on validation set, we performed another 10% simulation dropout and treat masked entries as validation set. |
| Hardware Specification | Yes | the pre-training was finished in less than 24 hours on a GPU server with 8 Nvidia Tesla v100 16GB cards. Table 1: Inference time(s) for querying 48, 082 cells on an A100 40GB GPU. |
| Software Dependencies | No | We used inner join by default of Anndata package. We implemented Deep Impute with default settings in DANCE Ding et al. (2022) package. We utilized R package SAVER to illustrate the performance of it. The paper mentions software packages but does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | The hyperparameters, datasets, and reproducibility information for pre-trained models are detailed in Appendix E. Table 5: Hyperparameters for pretraining Cell PLM model. |