Estimation and Quantization of Expected Persistence Diagrams
Authors: Vincent Divol, Theo Lacombe
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We now provide some numerical illustrations that showcase our different theoretical results and their use in practice. Throughout, PDs are computed using the Gudhi library (Maria et al., 2014) and OTp distances are computed building on tools available from the POT library (Flamary et al., 2021). See the supplementary material for further implementation details and complementary experiments. Convergence rates for the empirical EPD. We first showcase the rate of convergence of Theorem 1. |
| Researcher Affiliation | Academia | 1Université Paris-Saclay, CNRS, Inria, Laboratoire de Mathématiques d Orsay, 91405, Orsay, France. |
| Pseudocode | Yes | Algorithm 1 Online quantization of EPDs |
| Open Source Code | Yes | For the sake of conciseness, proofs have been deferred to the supplementary material along with code to reproduce our experiments.Our code will be made publicly available. |
| Open Datasets | Yes | We perform another experiment on the ORBIT5K dataset (Adams et al., 2017, 6.4.1), a benchmark dataset in TDA made of 5 classes with 1000 observations each (split into 70%/30% training/test) representing different dynamical systems, turned into PDs through ˇCech filtrations. |
| Dataset Splits | No | The paper mentions a '70%/30% training/test' split for the ORBIT5K dataset but does not explicitly describe a validation set split. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, or cloud instance types) used for running the experiments. |
| Software Dependencies | No | The paper mentions using the 'Gudhi library (Maria et al., 2014)' and 'POT library (Flamary et al., 2021)' but does not specify their exact version numbers. |
| Experiment Setup | Yes | All algorithms are initialized in the same way: we select the k points of highest persistence in the first diagram µ1. To compare the quality of these codebooks, we evaluate their distortion (4.2) with p = 2 and p = . As we do not have access to the true EPD E(P), we approximate this quantity through its empirical counterpart Rk,p(c) := R Ωmin1 j ck+1 x cj pdµn(x) 1 p , with Rk, (c) = maxx spt(µn) minj x cj . Results are given in Figure 4. Interestingly, when p = 2 our approach is on a par with the weighted codebook approach, but becomes substantially better when evaluated with p = , that is using the bottleneck distance which is the most natural metric to handle PDs. ... using batches of size 10 for OT2, OT and W2. ... set q = 1, while λ and θ are the 0.05 and 0.95 quantiles of the distribution of { x Ω q, x spt(µn)}. |