Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Geometric Analysis of Nonlinear Manifold Clustering

Authors: Nimita Shinde, Tianjiao Ding, Daniel Robinson, Rene Vidal

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In addition to providing proof of correctness in this setting, a numerical comparison with state-of-the-art methods on CIFAR datasets shows that our method performs competitively although marginally worse than methods without theoretical guarantees.
Researcher Affiliation	Academia	Nimita Shinde Lehigh University EMAIL Tianjiao Ding University of Pennsylvania EMAIL Daniel P. Robinson Lehigh University EMAIL René Vidal University of Pennsylvania EMAIL
Pseudocode	Yes	Algorithm 1 Pseudocode for clustering data using (WMC)
Open Source Code	Yes	We have provided our code in the Supplementary material.
Open Datasets	Yes	The CIFAR dataset consists of 60000 color images of size 32 32 that are divided into 10, 20, and 100 classes for CIFAR-10, CIFAR-20, CIFAR-100, respectively.
Dataset Splits	No	The paper mentions using a 'grid search' for hyperparameter tuning, which implies some form of validation, but it does not specify the explicit training, validation, and test data splits (e.g., percentages or sample counts) for the datasets used in the experiments. It only describes the dataset as divided into classes.
Hardware Specification	Yes	The experiments are performed on a machine with Intel(R) Xeon(R) Gold 6130 CPU operating at 2.10 GHz frequency and with 37 GB RAM.
Software Dependencies	No	The paper mentions 'We implemented the ADMM algorithm that solves SMCE [58] in Python.' but does not specify the version of Python or any other software libraries with their version numbers.
Experiment Setup	Yes	We use grid search over the following parameter values: η {1, 20, 100, 400} and λ {20, 50} λ0, where λ0 is the smallest value of λ that generates a non-trivial (non-zero) solution. We report the best accuracy results in Table 1. Furthermore, Table 2 provides the values of the parameters λ and η corresponding to the clustering results reported in rows 1 (L-WMC) and 2 (E-WMC) in Table 1.