reproducibilityindex.ai

On Generalization Bounds for Projective Clustering

Authors: Maria Sofia Bucarelli, Matilde Larsen, Chris Schwiegelshohn, Mads Toftrup

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We run the experiments both for center based clustering, as well as subspace clustering. While the focus of the paper is arguably more on subspace clustering, the experiments are important in both cases. Although both problems are hard to optimize exactly, center based clustering is significantly more tractable and thus may lend better insight into practical learning rates.
Researcher Affiliation	Academia	1Department of Computer, Control and Management Engineering Antonio Ruberti, Sapienza University of Rome, Italy 2Department of Computer Science, Aarhus University, Denmark
Pseudocode	No	The paper describes algorithms and mathematical formulations but does not include structured pseudocode blocks or algorithm listings.
Open Source Code	No	The paper mentions that the code was written using Python and PyTorch, but it does not provide any link to a source code repository or explicitly state that the code is publicly available.
Open Datasets	Yes	We use four publicly available real-world datasets: Mushroom [70], Skin-Nonskin [12], MNIST [51], and Covtype [15].
Dataset Splits	No	The paper describes how samples were drawn for evaluating excess risk (e.g., "sampling uniformly at random"), but it does not specify fixed training, validation, and test dataset splits with percentages or counts for reproducibility in the traditional machine learning sense.
Hardware Specification	Yes	All experiments were conducted on a machine equipped with a single NVIDIA RTX 2080 GPU.
Software Dependencies	No	The paper states: "We wrote all of the code using Python 3 and utilized the Pytorch library for implementations using gradient descent." It mentions Python 3 and PyTorch but does not provide specific version numbers for either, nor for any other libraries or solvers used.
Experiment Setup	Yes	For the cases, z {1, 3, 4}, the new center is obtained via gradient descent. The initial centers are chosen via Dz sampling, i.e. sampling centers proportionate to the zth power of the distance between a point and its closest center (for z = 2 this is the k-means++ algorithm by [6]). Specifically, we employed the Adam W optimizer to find the closest center with a learning rate set to 0.01.