Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
IsUMap: Manifold Learning and Data Visualization Leveraging Vietoris-Rips Filtrations
Authors: Parvaneh Joharinad, Hannaneh Fahimi, Lukas Silvester Barth, Janis Keck, Jรผrgen Jost
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Experiments The following experiments illustrate how Is UMap does not only capture topological features, and achieves density uniformization but also preserves intrinsic geometry. Low-Dimensional Geometries. We first test Is UMap on toy examples. In Fig. 1, we see samples of size 3000 from the Swiss Roll with a hole and the M obius strip, and in Fig. 2, a sample drawn from a 3-dimensional Mammoth with 20000 points. ... We next apply Is UMap to some high dimensional biological datasets... Finally, we compare the clustering abilities of Is UMap with the other methods. ... We trained a linear classifier on the representations obtained from Is UMap, UMAP, Isomap on a subset of the CIFAR-10 dataset (Krizhevsky, Hinton et al. 2009), varying the embedding dimension between 2 and 400, and report the test accuracy of each method (see appendix for training hyperparameters) in Fig. 10. |
| Researcher Affiliation | Academia | 1 Center for Scalable Data Analytics and Artificial Intelligence (Sca DS,AI) Dresden/Leipzig, Germany, 2 Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany, 3 Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany EMAIL, EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 Is UMap Input: Samples x1, ..., xn and/or distances d(xi, xj) Parameter: Metric-to-weight-function ฯ, k-neighborhood, T-conorm T, normalization (boolean) Output: Low dimensional representations y1, ..., yn. 1: initialize distance matrix D 2: knn,knn indices = k nearest neighbors(X,d) 3: D[not knn indices] = 4: if normalization = True then 5: D = D/D[:,knn indices[k]]. {normalize by distance to k-th neighbor for uniformization} 6: end if 7: W = ฯ(D) 8: W = T(W, W T ) {combine different edge weights via t-conorm} 9: D = ฯ 1( W) 10: Dnew = Dijkstra( D) 11: y1, ..., yn = MDS(Dnew) 12: return y1, ..., yn |
| Open Source Code | Yes | Code and extended article with appendix available at https://github.com/LUK4S-B/Is UMap |
| Open Datasets | Yes | In Fig. 1, we see samples of size 3000 from the Swiss Roll with a hole and the M obius strip, and in Fig. 2, a sample drawn from a 3-dimensional Mammoth with 20000 points. ... We analyze the datasets of trefoil-knotted protein chains building on the work of (Benjamin et al. 2023)... Leveraging the human forebrain dataset from (La Manno et al. 2018)... We build on (Gardner et al. 2022)... We use two standard high dimensional benchmarks, MNIST and the Wisconsin breast cancer datasets. Fig. 7 shows the result of Is UMap, Isomap and UMAP on the MNIST dataset, ... on a subset of the CIFAR-10 dataset (Krizhevsky, Hinton et al. 2009) |
| Dataset Splits | No | We trained a linear classifier on the representations obtained from Is UMap, UMAP, Isomap on a subset of the CIFAR-10 dataset (Krizhevsky, Hinton et al. 2009), varying the embedding dimension between 2 and 400, and report the test accuracy of each method (see appendix for training hyperparameters) in Fig. 10. The paper mentions using a 'subset of the CIFAR-10 dataset' but does not specify how this subset was chosen or how it was split for training, validation, or testing. |
| Hardware Specification | No | Below we present an explanation of the steps of the pseudo-algorithm for Is UMap presented in algorithm 1, c.f. Appendices for computing infrastructure. The paper refers to 'computing infrastructure' in the appendix but provides no specific hardware details (e.g., GPU/CPU models, memory) in the main text. |
| Software Dependencies | No | Insufficient information. The paper mentions algorithms and methods (e.g., 'Dijkstra s algorithm', 'MDS', 'k means'), but does not specify any software libraries or their version numbers used for implementation. |
| Experiment Setup | Yes | After selecting the parameters of the algorithm, i.e. k (for construction of k neighborhood graph), the operator ฯ (to map between probabilistic and metric weights), and the T-conorm T (used to symmetrize the weight matrix)... Visualization of a sample of N = 10000 points on a hemisphere with non-uniform distribution (top-left) in R2 with k = 30 by Is UMap (top-right)... (Figure 3 caption). ...both with k=15 in neighborhood graph (Figure 4 caption). ...k = 20 (Figure 9 caption). |