reproducibilityindex.ai

Hierarchical Clustering: $O(1)$-Approximation for Well-Clustered Graphs

Authors: Bogdan-Adrian Manghiuc, He Sun

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To demonstrate the signiﬁcance of our work, we compare our algorithm against the previous state-of-the-art with similar approximation guarantee [10] and well-known linkage heuristics on both synthetic and real-world data sets. 5 Experiments We experimentally evaluate the performance of our proposed algorithm, and compare it against the three well-known linkage heuristics for computing hierarchical clustering trees, and different variants of the algorithm proposed in [10], i.e. Linkage++, on both synthetic and real-world data sets.
Researcher Affiliation	Academia	Bogdan-Adrian Manghiuc School of Informatics The University of Edinburgh b.a.manghiuc@sms.ed.ac.uk He Sun School of Informatics The University of Edinburgh h.sun@ed.ac.uk
Pseudocode	Yes	Algorithm 1: HCwith Degrees(G{V }) and Algorithm 2: Prune Merge(G, k)
Open Source Code	No	The paper does not provide an explicit statement or link for the open-source code of their methodology.
Open Datasets	Yes	Real-world data sets. To evaluate the performance of our algorithm on real-world data sets, we follow the sequence of recent work on hierarchical clustering [2, 10, 19, 24], all of which are based on the following 5 data sets from the Scikit-learn library [22] as well as the UCI ML repository [1]: Iris, Wine, Cancer, Boston and Newsgroup6. [1] UCI ML Repository. https://archive.ics.uci.edu/ml/index.php. Accessed: 2021-05-22.
Dataset Splits	No	The paper does not explicitly provide details about training, validation, and test dataset splits, such as percentages, counts, or specific predefined splits.
Hardware Specification	Yes	All algorithms were implemented in Python 3.8 and the experiments were performed using an Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz processor, with 16 GB RAM.
Software Dependencies	Yes	All algorithms were implemented in Python 3.8
Experiment Setup	Yes	We ﬁrst set the number of clusters as k = 3, and the number of vertices in each cluster {Pi}3 i=1 as 1, 000. We assume that any pair of vertices within each cluster is connected by an edge with probability p, and any pair of vertices from different clusters is connected by an edge with probability q. We ﬁx the value q = 0.002, and consider different values of p [0.04, 0.2]. for each data set we construct the similarity graph based on the Gaussian kernel, in which the σ-value is chosen according to the standard heuristic [21].