Differentiable Mapper for Topological Optimization of Data Representation
Authors: Ziyad Oulhaj, Mathieu Carrière, Bertrand Michel
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We implement and showcase the efficiency of Mapper filter optimization through Soft Mapper on various data sets... We present applications on 3D shape data in Section 7.1 and on single-cell RNA sequencing data in Section 7.2. |
| Researcher Affiliation | Academia | 1Nantes Universit e, Ecole Centrale Nantes, Laboratoire de Math ematiques Jean Leray, CNRS UMR 6629, Nantes, France 2Data Shape, Centre Inria d Universit e Cˆote d Azur, Sophia Antipolis, France. |
| Pseudocode | Yes | Algorithm 1 Soft Mapper Optimization Algorithm Require: Initial parameter set θ0, Number of Monte Carlo random samples M, Learning rate sequence (αi)i, Random noise sequence (ξi)i, Number of epochs N. for 0 ≤ i < N do for 1 ≤ m ≤ M do e ∼ sample from Pθi yi,m ← an element of the sub-differential in θi of Le : θ 7 L(e, fθ) end for yi ← 1/M PM m=1 yi,m θi+1 ← θi − αi(yi + ξi) end for return θN |
| Open Source Code | Yes | We implement and showcase the efficiency of Mapper filter optimization through Soft Mapper on various data sets, with public, open-source code in TensorFlow. Our code is available at (Oulhaj, 2024). Oulhaj, Z. Mapper filter optimization. https://github. com/Ziyad Oulhaj/Mapper-Optimization, 2024. |
| Open Datasets | Yes | We now apply Mapper optimization on the human preimplantation dataset of (Petropoulos et al., 2016), which can also be found in the tutorial of the sc TDA Python library. The dataset can be accessed in the following link (sct), and it contains the expression levels for p = 26, 270 genes for each individual cell. sctda. https://github.com/Camara Lab/sc TDA. Accessed: 2024-01-23. |
| Dataset Splits | No | The paper mentions using a 'randomly sampled subset' for a heuristic and discusses 'training dataset' implicitly, but does not provide specific percentages, counts, or methodology for training/validation/test splits for reproducibility. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'TensorFlow', 'sc TDA Python library', and 'Seurat package in R' but does not specify their version numbers for reproducibility. |
| Experiment Setup | Yes | The parametric family of functions is linear, i.e., equal to {fθ : x 7 x, θ , θ R3}, and the cover assignment scheme Aδ is the smooth relaxation of the standard case, with δ = 10−2 (maxx∈Xn fθ(x) − minx∈Xn fθ(x))... The values of r (also called resolution), g (also called gain) and the number of clusters in the KMeans algorithm, for each 3-dimensional shape, are summarized in Appendix G. To find θ, we use the opposite of the L1 total (regular) persistence as a persistence specific loss ℓand we run Algorithm 1 with N = 200 and M = 10, each time taking the diagonal as the initial direction, i.e. θ0 = (1/3, 1/3, 1/3)T. |