Cell ontology guided transcriptome foundation model
Authors: XINYU YUAN, Zhihao Zhan, Zuobai Zhang, Manqi Zhou, Jianan Zhao, Boyu Han, Yue Li, Jian Tang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the generalizability and transferability of sc Cello on 22 million cells from Cellx Gene. For model generalization, we observe that sc Cello excels on cell type identification across all datasets in both zero-shot setting (i.e., directly using the pre-trained model) (Sec. 4.2.1) and fine-tuning setting (Sec. 4.2.2). In particular, sc Cello accurately classifies novel cell types by leveraging the ontology graph structure (Sec. 4.3). For transferability, sc Cello demonstrates competitive performances in predicting cell-type-specific marker genes (Sec. 4.4) and cancer drug responses (Sec. 4.5). Additionally, sc Cello is robust against batch effects (Sec. 4.6). Finally, we validate our contribution via ablation study (Sec. 4.7). |
| Researcher Affiliation | Academia | 1Mila Québec AI Institute, 2University of Montréal 3Mc Gill University, 4Cornell University, 5HEC Montréal, 6CIFAR AI Chair |
| Pseudocode | No | The paper describes methods and processes in narrative text and mathematical formulas but does not include explicit pseudocode blocks or algorithm listings. |
| Open Source Code | Yes | Source code and model weights are available at https://github.com/Deep Graph Learning/sc Cello. |
| Open Datasets | Yes | We pre-trained sc Cello on 22 million cells from Cellx Gene database leveraging their cell-type labels mapped to the cell ontology graph from Open Biological and Biomedical Ontology Foundry. Our TFM demonstrates competitive generalization and transferability performance over the existing TFMs on biologically important tasks including identifying novel cell types of unseen cells, prediction of cell-type-specific marker genes, and cancer drug responses. Source code and model weights are available at https://github.com/Deep Graph Learning/sc Cello. The sc RNA-seq data were downloaded from Cellx Gene. |
| Dataset Splits | Yes | We fine-tuned TFMs on a subset of our curated pre-training data, randomly selecting 90% for training and using the remaining 10% for validation. |
| Hardware Specification | Yes | An Adam optimizer [38] (learning rate: 0.001, weight decay: 0.001, warm-up steps: 3, 333) was used to train the sc Cello for 40, 000 steps on 4 NVIDIA A100 GPUs on Compute Canada. |
| Software Dependencies | No | The paper mentions software like Adam optimizer, Scanpy, Louvain algorithm, and RAPIDS, but does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | An Adam optimizer [38] (learning rate: 0.001, weight decay: 0.001, warm-up steps: 3, 333) was used to train the sc Cello for 40, 000 steps on 4 NVIDIA A100 GPUs on Compute Canada. We used 192 for batch size. More details are introduced in App. D. |