Hyperbolic Procrustes Analysis Using Riemannian Geometry

Authors: Ya-Wei Eileen Lin, Yuval Kluger, Ronen Talmon

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The efficacy of HPA, its theoretical properties, stability and computational efficiency are demonstrated in simulations. In addition, we showcase its performance on three batch correction tasks involving gene expression and mass cytometry data. Specifically, we demonstrate high-quality unsupervised batch effect removal from data acquired at different sites and with different technologies that outperforms recent methods for label-free alignment in hyperbolic spaces.
Researcher Affiliation Academia Viterbi Faculty of Electrical and Computer Engineering, Technion Program in Applied Mathematics, Yale University Interdepartmental Program in Computational Biology and Bioinformatics, Yale University Department of Pathology, Yale University
Pseudocode Yes Algorithm 1 Hyperbolic Procrustes analysis
Open Source Code Yes Our code is available at https://github.com/RonenTalmonLab/HyperbolicProcrustesAnalysis.
Open Datasets Yes We consider two publicly available datasets: METABRIC [8] and TCGA [26], consisting of samples from five breast cancer subtypes. In the second task, three cohorts of lung cancer (LC) gene expression data [21] are considered... The last task involves Cy TOF data [48]
Dataset Splits Yes We evaluate the quality of the alignment in two aspects using objective measures: (i) k-NN classification, with leave-one-batch-out cross-validation, is utilized for assessing the alignment of the intrinsic structure, and (ii) MMD [19] is used for assessing the distribution alignment quality.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, memory specifications).
Software Dependencies No The paper mentions software components like 'Python' and the 'POT library [12]', but it does not specify version numbers for any of these components.
Experiment Setup Yes The synthetic data in Ld is generated using the sampling scheme described in Section 2 based on [39]. Given an arbitrary point µ 2 Ld and an arbitrary SPD matrix Σ 2 Rd×d, we generate a set of N points Q(1) = {q(1)i }N i=1 centered at µ by Ld ∋ q(1)i = Expµ(PTµ0!µ(˜v(1)i )), where µ0 = [1, 0]> is the origin, v(1)i = [0, ˜v(1)i ]>, and ˜v(1)i ∼ N(0, Σ). We apply Algorithm 1 to align the three pairs of sets {Q(1), Q(2)}, {Q(1), Q(3)}, and {Q(1), Q(4)}, setting N = 100, σ = 1, and d ∈ {3, 5, 10, 20, . . . , 40}. Each experiment is repeated 10 times with different values of µ, Σ, µ0 and t.