Subsampling Methods for Persistent Homology
Authors: Frederic Chazal, Brittany Fasy, Fabrizio Lecci, Bertrand Michel, Alessandro Rinaldo, Larry Wasserman
ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 6, we apply our methods to two examples. Since computing the persistent homology of the Vietoris Rips (VR) filtrations built on top of a large samples is infeasible, we resort to the subsampling strategy described in Section 3. More formally, let XN = {x1, . . . , x N} be a large point cloud. We draw n subsamples, each of size m N points, from µ, the discrete uniform measure on XN. First, we use a toy example to compare the time complexity of computing the persistent homology of the entire point cloud, with the complexity of the subsampling approach. |
| Researcher Affiliation | Academia | Frederic Chazal FREDERIC.CHAZAL@INRIA.FR INRIA Saclay, Palaiseau, 91120, France Brittany Terese Fasy BRITTANY@FASY.US Computer Science Department, Tulane University, New Orleans, LA 70118 Fabrizio Lecci LECCI@CMU.EDU Department of Statistics, Carnegie Mellon University, Pittsburgh, PA 15213 Bertrand Michel BERTRAND.MICHEL@UPMC.FR LSTA, Universit e Pierre et Marie Curie (UPMC), Paris, 75005, France Alessandro Rinaldo ARINALDO@CMU.EDU Department of Statistics, Carnegie Mellon University, Pittsburgh, PA 15213 Larry Wasserman LARRY@CMU.EDU Department of Statistics, Carnegie Mellon University, Pittsburgh, PA 15213 |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Software. The computations in this paper were done using the R package TDA (Fasy et al., 2014a). The package includes a series of tools for the statistical analysis of persistent homology, including the methods described in Fasy et al. (2014b), Chazal et al. (2014b), Chazal et al. (2014a), and this paper. |
| Open Datasets | Yes | We use the publicly available database of triangulated shapes (Sumner & Popovi c, 2004). The dataset is publicly available at the UCI Machine Learning Repository1 and is described in Barshan & Y uksek (2013), where it is used to classify 19 activities performed by eight people wearing sensor units on the chest, arms, and legs. For ease of illustration, we report here the results on four activities (walking, stepper, cross trainer, jumping) performed by a single person (#1). (Footnote 1: http://archive.ics.uci.edu/ml/datasets/Daily+and+ Sports+Activities) |
| Dataset Splits | No | The paper does not explicitly provide details about training, validation, or test dataset splits. It only mentions general data usage in experiments. |
| Hardware Specification | Yes | required 28.34 seconds on a Macbook Pro with 2.8 GHz processor and 16 GB RAM. |
| Software Dependencies | No | The computations in this paper were done using the R package TDA (Fasy et al., 2014a). However, specific version numbers for the R package TDA or R itself are not provided. |
| Experiment Setup | Yes | More formally, let XN = {x1, . . . , x N} be a large point cloud. We draw n subsamples, each of size m N points, from µ, the discrete uniform measure on XN. The average landscape on the right plot is computed using n = 10 subsamples of size m = 100. For n = 100 times we subsample m = 300 points from each shape. For n = 80 times, we subsample m = 200 points from the point cloud of each activity. |