Semi-supervised Knowledge Transfer Across Multi-omic Single-cell Data
Authors: Fan Zhang, Tianyu Liu, Zihao Chen, Xiaojiang Peng, Chong Chen, Xian-Sheng Hua, Xiao Luo, Hongyu Zhao
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on many benchmark datasets suggest the superiority of our DANCE over a series of state-of-the-art methods. To validate the effectiveness of the proposed DANCE, we conduct extensive experiments on several benchmark multi-omic single-cell datasets |
| Researcher Affiliation | Collaboration | Fan Zhang1, Tianyu Liu2, Zihao Chen3, Xiaojiang Peng4, Chong Chen5, Xian-Sheng Hua5, Xiao Luo6, , Hongyu Zhao2 1Georgia Institute of Technology, 2Yale University, 3Peking University, 4Shenzhen Technology University, 5Terminus Group, 6University of California, Los Angeles |
| Pseudocode | Yes | The step-by-step training algorithm of our DANCE is summarized in Algorithm 1. The model can get extra supervision from target-specific sc ATAC-seq data by removing the incorrect types from the candidate label set, the algorithm is provided in Algorithm 2. |
| Open Source Code | Yes | Code is available at https://github.com/zfkarl/DANCE. |
| Open Datasets | Yes | Mouse Atlas Data [74]. The multi-omics data can be accessed from the Tabula Muris mouse data3, along with the quantitative gene activity score matrix. 3https://tabula-muris.ds.czbiohub.org/ |
| Dataset Splits | No | The paper mentions 'Three settings with different label ratios (low: 1%, mid: 5%, high: 10%) are set up for experiments on each dataset to validate the sensitivity of these methods to the number of labels.' This refers to the proportion of *labeled* source data, but it does not specify a general train/validation/test split for the entire dataset in terms of percentages or sample counts. |
| Hardware Specification | Yes | All the baselines are re-implemented on NVIDIA Tesla A100 40G GPUs using Py Torch according to the original settings in the corresponding papers to ensure a fair comparison. |
| Software Dependencies | No | The paper mentions 'Py Torch' but does not specify its version number or any other software with specific version numbers. |
| Experiment Setup | Yes | For all the baselines and our DANCE, we first warm up the model with labeled sc RNA-seq data for 30 epochs and then train the model for another 30 epochs with a batch size of 32. Three settings with different label ratios (low: 1%, mid: 5%, high: 10%) are set up for experiments on each dataset to validate the sensitivity of these methods to the number of labels. We opt for SGD as the default optimizer with a learning rate of 3e 3 and a weight decay of 1e 3. Here, k denotes the size of the partial label set and is used to control the number of potential correct cell types. As k varies within the range {2, 3, 4, 5, 6, 7}... optimal performance is achieved at k = 4 or k = 5. Next, with other parameters fixed, we analyze the sensitivity of the coefficient λ of LT S in Eqn. 10... We change the value of λ within the range {0.05, 0.1, 0.15, 0.2, 0.25, 0.3}... with better performance observed at λ = 0.1. |