Unsupervised Gene-Cell Collective Representation Learning with Optimal Transport

Authors: Jixiang Yu, Nanjun Chen, Ming Gao, Xiangtao Li, Ka-Chun Wong

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments with 14 competing methods on 15 real sc RNA-seq datasets demonstrate the competitive edges of sc GCOT.
Researcher Affiliation Academia 1Department of Computer Science, City University of Hong Kong, Hong Kong SAR 2School of Management Science and Engineering, Key Laboratory of Big Data Management Optimization and Decision of Liaoning Province, Dongbei University of Finance and Economics, Dalian, China 3Center for Post-doctoral Studies of Computer Science, Northeastern University, Shenyang, China 4School of Artificial Intelligence, Jilin University, Jilin, China 5Shenzhen Research Institute, City University of Hong Kong, Shenzhen, China 6Hong Kong Institute for Data Science, City University of Hong Kong, Hong Kong SAR
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any concrete access information, such as a repository link, or explicit statement about the release of source code for the described methodology.
Open Datasets Yes As shown in Table 1, 15 real datasets are selected for evaluation. These datasets range in size from hundreds to thousands and come from different platforms. Meanwhile, the number of cell types in the datasets are also various, ranging from 3 to 46. The annotation of cell types from the original publications is used as the ground truth.
Dataset Splits No The paper mentions running methods multiple times and taking averages but does not specify explicit training, validation, and test dataset splits or cross-validation details for reproducibility.
Hardware Specification Yes All experiments are conducted on a Ubuntu 20.04 server equipped with 128GB memory and two RTX 4090 GPUs.
Software Dependencies Yes The proposed sc GCOT is constructed with Py Torch 2.0.0, Py G 2.3.0, and Python 3.10.11.
Experiment Setup Yes In the proposed sc GCOT method, the cell graph and gene graph are constructed using KNN algorithm with the nearest neighbor parameter K = 15 and the number of highly variable genes n = 500. In the graph autoencoders, f 1 E and f 2 E are both set as a two-layer Graph Transformer network, with the hidden dimensions of 128 and 15, respectively. The output dimension of the fully-connected networks to obtain ˆZc and ˆZg are both set to 32. Our algorithm consists of 300 epochs of embedding learning and 100 epochs of representation alignment and cluster assignment learning. The loss weights {λ1, λ2, λ3} and {λ 1, 1 2σ2 1 , 1 2σ2 2 } are set to {1, 0.3, 1} and {0.3, 1.5, 2}, respectively. Our model is optimized using equation (24) and the Adam algorithm with the learning rate 5e-4 in embedding learning stage. In the representation alignment and cluster assignment stage, we use equation (25) to train the model and let the learning rate increase from 1e-7 to 1e-4 linearly in the first half of total epochs and decrease linearly to 1e-7 in the second half.