Estimation and Quantization of Expected Persistence Diagrams

Authors: Vincent Divol, Theo Lacombe

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We now provide some numerical illustrations that showcase our different theoretical results and their use in practice. Throughout, PDs are computed using the Gudhi library (Maria et al., 2014) and OTp distances are computed building on tools available from the POT library (Flamary et al., 2021). See the supplementary material for further implementation details and complementary experiments. Convergence rates for the empirical EPD. We first showcase the rate of convergence of Theorem 1.
Researcher Affiliation Academia 1Université Paris-Saclay, CNRS, Inria, Laboratoire de Mathématiques d Orsay, 91405, Orsay, France.
Pseudocode Yes Algorithm 1 Online quantization of EPDs
Open Source Code Yes For the sake of conciseness, proofs have been deferred to the supplementary material along with code to reproduce our experiments.Our code will be made publicly available.
Open Datasets Yes We perform another experiment on the ORBIT5K dataset (Adams et al., 2017, 6.4.1), a benchmark dataset in TDA made of 5 classes with 1000 observations each (split into 70%/30% training/test) representing different dynamical systems, turned into PDs through ˇCech filtrations.
Dataset Splits No The paper mentions a '70%/30% training/test' split for the ORBIT5K dataset but does not explicitly describe a validation set split.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU, GPU models, or cloud instance types) used for running the experiments.
Software Dependencies No The paper mentions using the 'Gudhi library (Maria et al., 2014)' and 'POT library (Flamary et al., 2021)' but does not specify their exact version numbers.
Experiment Setup Yes All algorithms are initialized in the same way: we select the k points of highest persistence in the first diagram µ1. To compare the quality of these codebooks, we evaluate their distortion (4.2) with p = 2 and p = . As we do not have access to the true EPD E(P), we approximate this quantity through its empirical counterpart Rk,p(c) := R Ωmin1 j ck+1 x cj pdµn(x) 1 p , with Rk, (c) = maxx spt(µn) minj x cj . Results are given in Figure 4. Interestingly, when p = 2 our approach is on a par with the weighted codebook approach, but becomes substantially better when evaluated with p = , that is using the bottleneck distance which is the most natural metric to handle PDs. ... using batches of size 10 for OT2, OT and W2. ... set q = 1, while λ and θ are the 0.05 and 0.95 quantiles of the distribution of { x Ω q, x spt(µn)}.