Data-Driven Clustering via Parameterized Lloyd's Families
Authors: Maria-Florina F. Balcan, Travis Dick, Colin White
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we show positive theoretical and empirical results for learning the best initialization and local search procedures over a large family of algorithms. and In this section, we empirically evaluate the effect of the α parameter on clustering cost for realworld and synthetic clustering domains. |
| Researcher Affiliation | Academia | Maria-Florina Balcan Department of Computer Science Carnegie-Mellon University Pittsburgh, PA 15213 ninamf@cs.cmu.edu Travis Dick Department of Computer Science Carnegie-Mellon University Pittsburgh, PA 15213 tdick@cs.cmu.edu Colin White Department of Computer Science Carnegie-Mellon University Pittsburgh, PA 15213 crwhite@cs.cmu.edu |
| Pseudocode | Yes | Algorithm 1 (α, β)-Lloyds++ Clustering |
| Open Source Code | No | The paper does not provide any explicit statement or link for open-source code for the methodology described. |
| Open Datasets | Yes | We ran experiments on datasets including MNIST, CIFAR10, CNAE9, and a synthetic Gaussian Grid dataset. |
| Dataset Splits | Yes | We generate m = 50, 000 samples from each distribution and divide them into equal-sized training and test sets. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or cluster specifications) used for running experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9). |
| Experiment Setup | Yes | For MNIST and CIFAR10 we set k = 5, and N = 100, while for CNAE9 and the Gaussian Grid we set k = 4 and N = 120. and We always measure distance between points using the ℓ2 distance and set β = 2. |