Semi-supervised Knowledge Transfer Across Multi-omic Single-cell Data

Authors: Fan Zhang, Tianyu Liu, Zihao Chen, Xiaojiang Peng, Chong Chen, Xian-Sheng Hua, Xiao Luo, Hongyu Zhao

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on many benchmark datasets suggest the superiority of our DANCE over a series of state-of-the-art methods. To validate the effectiveness of the proposed DANCE, we conduct extensive experiments on several benchmark multi-omic single-cell datasets
Researcher Affiliation Collaboration Fan Zhang1, Tianyu Liu2, Zihao Chen3, Xiaojiang Peng4, Chong Chen5, Xian-Sheng Hua5, Xiao Luo6, , Hongyu Zhao2 1Georgia Institute of Technology, 2Yale University, 3Peking University, 4Shenzhen Technology University, 5Terminus Group, 6University of California, Los Angeles
Pseudocode Yes The step-by-step training algorithm of our DANCE is summarized in Algorithm 1. The model can get extra supervision from target-specific sc ATAC-seq data by removing the incorrect types from the candidate label set, the algorithm is provided in Algorithm 2.
Open Source Code Yes Code is available at https://github.com/zfkarl/DANCE.
Open Datasets Yes Mouse Atlas Data [74]. The multi-omics data can be accessed from the Tabula Muris mouse data3, along with the quantitative gene activity score matrix. 3https://tabula-muris.ds.czbiohub.org/
Dataset Splits No The paper mentions 'Three settings with different label ratios (low: 1%, mid: 5%, high: 10%) are set up for experiments on each dataset to validate the sensitivity of these methods to the number of labels.' This refers to the proportion of *labeled* source data, but it does not specify a general train/validation/test split for the entire dataset in terms of percentages or sample counts.
Hardware Specification Yes All the baselines are re-implemented on NVIDIA Tesla A100 40G GPUs using Py Torch according to the original settings in the corresponding papers to ensure a fair comparison.
Software Dependencies No The paper mentions 'Py Torch' but does not specify its version number or any other software with specific version numbers.
Experiment Setup Yes For all the baselines and our DANCE, we first warm up the model with labeled sc RNA-seq data for 30 epochs and then train the model for another 30 epochs with a batch size of 32. Three settings with different label ratios (low: 1%, mid: 5%, high: 10%) are set up for experiments on each dataset to validate the sensitivity of these methods to the number of labels. We opt for SGD as the default optimizer with a learning rate of 3e 3 and a weight decay of 1e 3. Here, k denotes the size of the partial label set and is used to control the number of potential correct cell types. As k varies within the range {2, 3, 4, 5, 6, 7}... optimal performance is achieved at k = 4 or k = 5. Next, with other parameters fixed, we analyze the sensitivity of the coefficient λ of LT S in Eqn. 10... We change the value of λ within the range {0.05, 0.1, 0.15, 0.2, 0.25, 0.3}... with better performance observed at λ = 0.1.