Topological Singularity Detection at Multiple Scales

Authors: Julius Von Rohrscheidt, Bastian Rieck

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show the utility of this perspective experimentally on several data sets, ranging from spaces with known singularities to high-dimensional image data sets. 5. Experiments We demonstrate the expressivity of TARDIS in different settings, showing that it (i) calculates the correct intrinsic dimension, and (ii) detects singularities when analysing data sets with known singularities. We also conduct a brief comparison with one-parameter approaches, showcasing how our multi-scale approach results in more stable outcomes. Finally, we analyse Euclidicity scores of benchmark and real-world datasets, giving evidence that our technique can be used as a measure for the geometric complexity of data.
Researcher Affiliation Academia 1Helmholtz Munich 2Technical University of Munich.
Pseudocode Yes Appendix A.4. Pseudocode We provide brief pseudocode implementations of the algorithms discussed in Section 4. In the following, we use # Bari(X) to denote the number of i-dimensional persistent barcodes of X (w.r.t. the Vietoris Rips filtration, but any other choice of filtration affords the same description). Algorithm 1 explains the calculation of persistent intrinsic dimension (see Section 4.1 in the main paper for details). For the subsequent algorithms, we assume that the estimated dimension of the intrinsic dimension of the data is n. Algorithm 1 An algorithm for calculating the persistent intrinsic dimension (PID). Algorithm 2 An algorithm for calculating the Euclidicity values δjk.
Open Source Code Yes Reproducibility Our code is available under https://github.com/aidos-lab/TARDIS. All dependencies are listed in the respective pyproject.toml file, and the README.md discusses how to install our package and run our experiments.
Open Datasets Yes To test TARDIS in an unsupervised setting, we calculate Euclidicity scores for the MNIST and FASHIONMNIST data sets... To highlight the utility of Euclidicity in unsupervised representation learning, we also calculate it on an induced pluripotent stem cell (i PSC) reprogramming data set (Zunder et al., 2015). Finally, we applied Euclidicity to a benchmark histology data set.7 (Footnote 7: https://github.com/basveeling/pcam)
Dataset Splits No The paper does not explicitly provide training/validation/test dataset splits with specific percentages, sample counts, or references to predefined splits.
Hardware Specification No The paper does not specify any particular hardware used for running the experiments (e.g., specific GPU/CPU models, memory amounts).
Software Dependencies No The paper mentions 'scikit-dimension toolkit', 'The GUDHI Project', 'Ripser', and 'scikit-learn Python package' but does not provide specific version numbers for these software components within the text. It states 'All dependencies are listed in the respective pyproject.toml file', but this is an external file, not part of the paper's text.
Experiment Setup Yes Following Pope et al. (2021), we assume an intrinsic dimension of 10; moreover, we use k = 50 neighbours for local scale estimation. The neural network that we trained for the analysis in Section 5.4 consists of an input layer of 784 nodes, one dense hidden layer of 5 nodes and an output layer possessing 10 nodes, respectively. The same architecture was used for both MNIST and FASHIONMNIST, resulting in 3985 trainable parameters.