Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Efficient Clustering Based On A Unified View Of $K$-means And Ratio-cut

Authors: Shenfei Pei, Feiping Nie, Rong Wang, Xuelong Li

NeurIPS 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on 12 real-world benchmark and 8 facial datasets validate the advantages of the proposed algorithm compared to the state-of-the-art clustering algorithms. In particular, over 15x and 7x speed-up can be obtained with respect to k-means on the synthetic dataset of 1 million samples and the benchmark dataset (Celeb A) of 200k samples, respectively.
Researcher Affiliation	Academia	Shenfei Pei School of Computer Science and Center for OPTIMAL Northwestern Polytechnical University EMAIL Feiping Nie School of Computer Science and Center for OPTIMAL Northwestern Polytechnical University EMAIL Rong Wang School of Cybersecurity and Center for OPTIMAL Northwestern Polytechnical University EMAIL Xuelong Li School of Computer Science and Center for OPTIMAL Northwestern Polytechnical University EMAIL
Pseudocode	Yes	Algorithm 1: An efﬁcient program for solving problem (21).
Open Source Code	Yes	In particular, over 15x and 7x speed-up can be obtained with respect to k-means on the synthetic dataset of 1 million samples and the benchmark dataset (Celeb A) of 200k samples, respectively [Git Hub].
Open Datasets	Yes	Web Face [50] and Celeb A [23] are two large-scale public datasets available for face recognition and veriﬁcation problems. CALFW [54] and CPLFW [53] are two variants of LFW aiming at cross-age and cross-pose face recognition, respectively. CACD [5], Adience [15], and FERET [35] are constructed for cross-age face retrieval, age and gender recognition, and facial recognition system evaluation.
Dataset Splits	No	The paper does not explicitly provide details about validation dataset splits (e.g., percentages or sample counts).
Hardware Specification	Yes	Both k-means and our code run on the Arch machine with 3.20 GHz i7-8700 CPU, 32 GB main memory.
Software Dependencies	No	The paper mentions software like 'scikit-learn', 'C++', 'Dlib', and 'EFANNA', but it does not specify exact version numbers for any of these software dependencies.
Experiment Setup	Yes	The number of nearest neighbors k is ﬁxed at 20 for 6 synthetic and 12 middle-scale real world datasets. The k-nearest neighbors graphs are generated by EFANNA [14] with k = 100 for all facial datasets. Every method takes 50 runs. The average results are reported.