Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Data-Driven Clustering via Parameterized Lloyd's Families
Authors: Maria-Florina F. Balcan, Travis Dick, Colin White
NeurIPS 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we show positive theoretical and empirical results for learning the best initialization and local search procedures over a large family of algorithms. and In this section, we empirically evaluate the effect of the α parameter on clustering cost for realworld and synthetic clustering domains. |
| Researcher Affiliation | Academia | Maria-Florina Balcan Department of Computer Science Carnegie-Mellon University Pittsburgh, PA 15213 EMAIL Travis Dick Department of Computer Science Carnegie-Mellon University Pittsburgh, PA 15213 EMAIL Colin White Department of Computer Science Carnegie-Mellon University Pittsburgh, PA 15213 EMAIL |
| Pseudocode | Yes | Algorithm 1 (α, β)-Lloyds++ Clustering |
| Open Source Code | No | The paper does not provide any explicit statement or link for open-source code for the methodology described. |
| Open Datasets | Yes | We ran experiments on datasets including MNIST, CIFAR10, CNAE9, and a synthetic Gaussian Grid dataset. |
| Dataset Splits | Yes | We generate m = 50, 000 samples from each distribution and divide them into equal-sized training and test sets. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or cluster specifications) used for running experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9). |
| Experiment Setup | Yes | For MNIST and CIFAR10 we set k = 5, and N = 100, while for CNAE9 and the Gaussian Grid we set k = 4 and N = 120. and We always measure distance between points using the ℓ2 distance and set β = 2. |