reproducibilityindex.ai

Fast and Accurate $k$-means++ via Rejection Sampling

Authors: Vincent Cohen-Addad, Silvio Lattanzi, Ashkan Norouzi-Fard, Christian Sohler, Ola Svensson

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section we empirically validate out theoretical results by comparing our algorithms FASTKMEANS++ and REJECTIONSAMPLING (see the details on how we set the parameters for LSH in the full version ) with the following two baselines: K-MEANS++ algorithm: Perhaps the most commonly used algorithm in this ﬁeld. It samples k points according to the D2-distribution. AFKMC2 algorithm: A recent result [4] based on random walks that improves the running time of the K-MEANS++ algorithm while maintaining a (weaker) theoretical guarantee on the solution quality.
Researcher Affiliation	Collaboration	Vincent Cohen-Addad Google Research cohenaddad@google.com Silvio Lattanzi Google Research silviol@google.com Ashkan Norouzi-Fard Google Research ashkannorouzi@google.com Christian Sohler University of Cologne csohler@uni-koeln.de Ola Svensson EPFL ola.svensson@epfl.ch
Pseudocode	Yes	Algorithm 1 MULTITREEOPEN, Algorithm 2 MULTITREESAMPLE, Algorithm 3 FASTK-MEANS++, Algorithm 4 REJECTIONSAMPLING
Open Source Code	No	The paper does not include an unambiguous statement that the authors are releasing their source code for the described methodology, nor does it provide a direct link to a code repository for their proposed algorithms.
Open Datasets	Yes	We ran our algorithms on three classic datasets from UCI library [15]: KDD-Cup [12] (311, 029 points of dimension 74) and song [8] (515, 345 points of dimension 90) Census [19] (2, 458, 285 points of dimension 68).
Dataset Splits	No	The paper mentions using datasets but does not specify any explicit train/validation/test splits, percentages, or methodology for partitioning the data, nor does it reference predefined splits with citations.
Hardware Specification	No	The paper states 'The algorithms were run on a standard desktop computer' but provides no specific details regarding the CPU, GPU, memory, or any other hardware components used for the experiments.
Software Dependencies	No	The paper does not list specific software dependencies with their version numbers (e.g., Python, PyTorch, or specific solvers with versions) that would be needed to replicate the experiments.
Experiment Setup	No	The paper mentions using the AFKMC2 algorithm with a parameter 'm = 200', but it does not provide specific hyperparameter values or detailed training configurations for its own proposed algorithms (FASTK-MEANS++ and REJECTIONSAMPLING) within the main text.