Wasserstein convergence of Cech persistence diagrams for samplings of submanifolds

Authors: Charles Arnal, David Cohen-Steiner, Vincent Divol

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also perform various experiments to illustrate the validity of our results and their relevance to the classic TDA pipeline. All proofs are deferred to the Appendix.
Researcher Affiliation Academia Charles Arnal Université Paris-Saclay, Inria David Cohen-Steiner Université Côte d Azur, Inria Vincent Divol CREST, ENSAE
Pseudocode No The paper does not contain any clearly labeled pseudocode blocks or algorithms in its main text or appendices.
Open Source Code No The numerical experiments that we ran are basic and can be easily reproduced using the GUDHI library [62]. We therefore did not feel that providing open access to the code was necessary. We are happy to share it on demand.
Open Datasets No The paper describes generating synthetic data for its experiments: "We create a generic submanifold of dimension m by applying a random diffeomorphism Ψ to a given m-dimensional submanifold M0 (e.g. a torus). We then draw a sample of n i.i.d. observations sampled according to the pushforward P of the uniform distribution on M by Ψ." It does not use a pre-existing public or open dataset with specific access information.
Dataset Splits No The paper describes running experiments with varying 'n' (sample size) but does not specify dataset splits (e.g., 80/10/10) for training, validation, or testing in a typical machine learning pipeline.
Hardware Specification No As stated in the appendix, we only ran small to medium-scale experiments, which each took no more than an hour on a standard office laptop.
Software Dependencies No PDs are computed using GUDHI [62]. PDs are plotted using the giotto-tda library [72] and persistence images using scikit-tda [64]. All experiments can be easily run on a standard office laptop over a few hours. The distance OT2(µn,1, µf,1) is then computed by approximating the measures on a grid: the distance converges to 0 as predicted by Theorem 4.1. See also Figure 6.
Experiment Setup Yes Simulation of generic tori. Let R > r be two positive numbers. ... We create a random torus by sampling pairs of angles Y = {y1, . . . , y K} uniformly at random, conditioned on the fact that maxy [0,2π]2 d Y(y) < σ. We then draw the numbers a1, . . . , a K as exponential random variables of scale 0.5, conditioned on the fact that maxϕ [0,2π] r(ϕ) < R r. We let M be the image of [0, 2π]2 by F. ... We sample n = 104 points according to P, and compute the ˇCech PDs of the corresponding set An. For i = 1, we plot the persistence images with weights persp, p = 0, 1, 2, 4, with both birth and persistence values ranging between 0 and 1, and grid step equal to 0.002. ... For values of n ranging from 102 to 104, we compute Persp(dgm(1) i (An)) in three scenarios: ... We obtain a (nonnuniform) probability measure, having density f. We then compute, for various values of n, the measure µn,1. The measure is approximated by kernel density estimation (see Figure 6). We approximate in a similar manner the measure µ ,1,2 by sampling n = 105 points on a square. We then apply the change of variable formula (9) to compute the theoretical limit µf,1. The distance OT2(µn,1, µf,1) is then computed by approximating the measures on a grid: the distance converges to 0 as predicted by Theorem 4.1.