reproducibilityindex.ai

Affinity Clustering: Hierarchical Clustering at Scale

Authors: Mohammadhossein Bateni, Soheil Behnezhad, Mahsa Derakhshan, MohammadTaghi Hajiaghayi, Raimondas Kiveris, Silvio Lattanzi, Vahab Mirrokni

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show experimentally that our algorithms are scalable for huge data sets, e.g., for graphs with trillions of edges. ... Last but not least, we present an experimental study where we analyze the scalability and effectiveness of our newly introduced algorithms and we observe that, in most cases, afﬁnity clustering outperforms all state-of-the-art algorithms from both quality and scalability standpoints.
Researcher Affiliation	Collaboration	Mohammad Hossein Bateni Google Research bateni@google.com Soheil Behnezhad University of Maryland soheil@cs.umd.edu Mahsa Derakhshan University of Maryland mahsaa@cs.umd.edu Mohammad Taghi Hajiaghayi University of Maryland hajiagha@cs.umd.edu Raimondas Kiveris Google Research rkiveris@google.com Silvio Lattanzi Google Research silviol@google.com Vahab Mirrokni Google Research mirrokni@google.com
Pseudocode	Yes	Algorithm 1 MST of Dense Graphs ... (See Algorithm 2 in the appendix.)
Open Source Code	Yes	Implementations are available at https://github.com/Mahsa Derakhshan/Affinity Clustering.
Open Datasets	Yes	We run our experiments on several data sets from the UCI database [37] and use Euclidean distance6. ... 6We consider Iris, Wine, Soybean, Digits and Glass data sets. ... [37] Moshe Lichman. UCI machine learning repository, 2013.
Dataset Splits	No	The paper mentions using datasets from the UCI database and evaluating performance using the Rand index with a 'ground truth clustering T'. However, it does not explicitly provide details about training, validation, or test dataset splits (e.g., percentages, sample counts, or specific predefined split citations).
Hardware Specification	No	The paper states 'While we cannot reveal the exact running times and number of machines used in the experiments, we report these quantities in normalized form.' It mentions 'Map Reduce workers' and 'machines for the DHT' but provides no specific hardware details such as GPU or CPU models, memory, or specific cloud instances used for the experiments.
Software Dependencies	No	The paper mentions using distributed computing platforms like 'Spark [45] and Hadoop [43] as well as Map Reduce and its extension Flume [17]' and 'Distributed Hash Tables (DHTs) [12, 31]', but it does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup	No	The paper discusses the algorithms and their evaluation but does not provide specific experimental setup details such as hyperparameter values (e.g., learning rates, batch sizes, number of epochs) or specific training configurations for the algorithms tested.