Persistent Homology for High-dimensional Data Based on Spectral Methods

Authors: Sebastian Damrich, Philipp Berens, Dmitry Kobak

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply these methods to high-dimensional single-cell RNA-sequencing data and show that spectral distances allow robust detection of cell cycle loops. ... 3. a synthetic benchmark, with spectral distances outperforming state-of-the-art alternatives; 4. an application to a range of single-cell RNA-sequencing datasets with ground-truth cycles.
Researcher Affiliation Academia Hertie Institute for AI in Brain Health, University of Tübingen, Germany Tübingen AI Center, Germany IWR, Heidelberg University, Germany
Pseudocode No The paper describes algorithms and methods in textual form but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at https://github.com/berenslab/eff-ph/tree/neurips2024.
Open Datasets Yes The Malaria dataset [43]... We obtained the pre-processed data from https://github.com/vhowick/Malaria_Cell_Atlas/raw/v1.0/Expression_Matrices/Smartseq2/SS2_tmmlogcounts.csv.zip. ... The Neural IPC dataset [8]... shared this representation with us for a superset of 297 927 telencephalic exitatory cells and allowed us to share it with this paper (MIT License). ... The Neurosphere dataset [89]... The GO PCA representation was downloaded from https://zenodo.org/record/5519841/files/neurosphere.qs. ... The Hippocampus dataset [89]... The GO PCA representation was downloaded from https://zenodo.org/record/5519841/files/hipp.qs. ... The He La2 dataset [72, 89]... The GO PCA representation was downloaded from https://zenodo.org/record/5519841/files/HeLa2.qs. ... The Pancreas dataset [3, 89]... The GO PCA representation was downloaded from https://zenodo.org/record/5519841/files/endo.qs.
Dataset Splits No The paper describes synthetic data generation and sampling from real-world single-cell datasets, but does not specify explicit training, validation, or test dataset splits. It evaluates performance directly on the sampled data.
Hardware Specification Yes Our experiments were run on a machine with an Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz with 64 kernels, 377GB memory, and an NVIDIA RTX A6000 GPU.
Software Dependencies Yes We computed persistent homology using the ripser [4] project s representative-cycles branch at commit 140670f to compute persistent homologies and representative cycles. ... To compute k NN graphs, we used the Py Keops package [13]. The rest of our implementation is in Python.
Experiment Setup Yes All methods come with hyperparameters. We report the results for the best hyperparameter setting on each dataset (Appendix K) but found spectral methods to be robust to these choices (Appendix L). ... Appendix I. Details on the distances used in our benchmark.