Wasserstein convergence of Cech persistence diagrams for samplings of submanifolds
Authors: Charles Arnal, David Cohen-Steiner, Vincent Divol
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also perform various experiments to illustrate the validity of our results and their relevance to the classic TDA pipeline. All proofs are deferred to the Appendix. |
| Researcher Affiliation | Academia | Charles Arnal Université Paris-Saclay, Inria David Cohen-Steiner Université Côte d Azur, Inria Vincent Divol CREST, ENSAE |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode blocks or algorithms in its main text or appendices. |
| Open Source Code | No | The numerical experiments that we ran are basic and can be easily reproduced using the GUDHI library [62]. We therefore did not feel that providing open access to the code was necessary. We are happy to share it on demand. |
| Open Datasets | No | The paper describes generating synthetic data for its experiments: "We create a generic submanifold of dimension m by applying a random diffeomorphism Ψ to a given m-dimensional submanifold M0 (e.g. a torus). We then draw a sample of n i.i.d. observations sampled according to the pushforward P of the uniform distribution on M by Ψ." It does not use a pre-existing public or open dataset with specific access information. |
| Dataset Splits | No | The paper describes running experiments with varying 'n' (sample size) but does not specify dataset splits (e.g., 80/10/10) for training, validation, or testing in a typical machine learning pipeline. |
| Hardware Specification | No | As stated in the appendix, we only ran small to medium-scale experiments, which each took no more than an hour on a standard office laptop. |
| Software Dependencies | No | PDs are computed using GUDHI [62]. PDs are plotted using the giotto-tda library [72] and persistence images using scikit-tda [64]. All experiments can be easily run on a standard office laptop over a few hours. The distance OT2(µn,1, µf,1) is then computed by approximating the measures on a grid: the distance converges to 0 as predicted by Theorem 4.1. See also Figure 6. |
| Experiment Setup | Yes | Simulation of generic tori. Let R > r be two positive numbers. ... We create a random torus by sampling pairs of angles Y = {y1, . . . , y K} uniformly at random, conditioned on the fact that maxy [0,2π]2 d Y(y) < σ. We then draw the numbers a1, . . . , a K as exponential random variables of scale 0.5, conditioned on the fact that maxϕ [0,2π] r(ϕ) < R r. We let M be the image of [0, 2π]2 by F. ... We sample n = 104 points according to P, and compute the ˇCech PDs of the corresponding set An. For i = 1, we plot the persistence images with weights persp, p = 0, 1, 2, 4, with both birth and persistence values ranging between 0 and 1, and grid step equal to 0.002. ... For values of n ranging from 102 to 104, we compute Persp(dgm(1) i (An)) in three scenarios: ... We obtain a (nonnuniform) probability measure, having density f. We then compute, for various values of n, the measure µn,1. The measure is approximated by kernel density estimation (see Figure 6). We approximate in a similar manner the measure µ ,1,2 by sampling n = 105 points on a square. We then apply the change of variable formula (9) to compute the theoretical limit µf,1. The distance OT2(µn,1, µf,1) is then computed by approximating the measures on a grid: the distance converges to 0 as predicted by Theorem 4.1. |