Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Cover learning for large-scale topology representation

Authors: Luis Scoccola, Uzu Lim, Heather A. Harrington

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide an implementation of Shape Discover (Scoccola & Lim, 2025), a cover learning algorithm based on our theory, and showcase it on two sets of experiments: a quantitative one on topological inference, and a qualitative one on large-scale topology visualization. In the first case, Shape Discover learns topologically correct simplicial complexes, on synthetic and real data, of smaller size than those obtained with previous topological inference approaches. In the second, we argue that Shape Discover represents the large-scale topology of real data better, and with more intuitive parameters, than previous TDA algorithms that fit the cover learning framework.
Researcher Affiliation	Academia	1Centre de Recherches Math ematiques et Institut des sciences math ematiques, Laboratoire de combinatoire et d informatique math ematique de l Universit e du Qu ebec a Montr eal, Universit e de Sherbrooke, Canada. 2Queen Mary University of London, United Kingdom. 3Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany; Centre for Systems Biology, Dresden, Germany; Faculty of Mathematics, Technische Universit at Dresden, Germany; Mathematical Institute, University of Oxford, United Kingdom. Correspondence to: Luis Scoccola <EMAIL>.
Pseudocode	Yes	Algorithm 1 1D Mapper cover learning algorithm Input: Data X, function f : X R, clustering algo- rithm Cθ, parameter(s) θ for Cθ, cover {Ii}k i=1 of R Take pullback cover {f 1(Ii)}k i=1 of X Let Ui := Cθ(f 1(Ii)) for 1 i k Return The union Sk i=1 Ui Algorithm 2 Ball Mapper cover learning algorithm Input: Data X, ε > 0 Build an ε-net {yi}k i=1 of X Return The cover {B(yi, ε)}k i=1 Algorithm 3 Shape Discover fuzzy cover learning algorithm Input: Point cloud X RN Parameters: n cov N, n neigh N, reg > 0 Optimization parameters: lr, n epoch, p [1, ) G := Neighborhood Graph(X, n neigh) g := Fuzzy Cover Initialization(G, n cov) h := Parametric Partition Of Unity() θ := Initialize Parametric Model (h, g) L(θ) := b M + reg b G + b T + reg b R (πp hθ) θ := Gradient Descent(L, n epoch, lr, init = θ ) Return π hθ
Open Source Code	Yes	Our implementation of Shape Discover (Scoccola & Lim, 2025) is in Py Torch (Paszke et al., 2019), and we rely on Numpy (Harris et al., 2020), Scipy (Virtanen et al., 2020), Numba (Lam et al., 2015), Scikit-learn (Pedregosa et al., 2011), and Gudhi (The GUDHI Project, 2015). (...) Scoccola, L. and Lim, U. Shapediscover: Learning covers with geometric optimization. https://github.com/luisscoccola/shapediscover, 2025.
Open Datasets	Yes	The dataset is from (Lederman & Talmon, 2018)... The dataset is from (Gardner et al., 2022)... This is the dataset of (Alpaydin & Kaynak, 1998)... This is the classical dataset of (Deng, 2012)... This is the dataset of (Packer et al., 2019)...
Dataset Splits	No	We use the training data, which consists of 60000 handwritten digits encoded as vectors in 784 dimensions.
Hardware Specification	Yes	All experiments were run on a Mac Book Pro with Apple M1 Pro processor and 8GB of RAM.
Software Dependencies	No	Our implementation of Shape Discover (Scoccola & Lim, 2025) is in Py Torch (Paszke et al., 2019), and we rely on Numpy (Harris et al., 2020), Scipy (Virtanen et al., 2020), Numba (Lam et al., 2015), Scikit-learn (Pedregosa et al., 2011), and Gudhi (The GUDHI Project, 2015).
Experiment Setup	Yes	In all experiments we fix the following default parameters: number of neighbors for the neighborhood graph n neigh = 15, regularization parameter reg = 10, number of iterations for gradient descent n epoch = 500, learning rate for gradient descent lr = 0.1, and approximation parameter for fuzzy cover p = 5 (Section 4.4).