Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Simultaneous Clustering and Ensemble

Authors: Zhiqiang Tao, Hongfu Liu, Yun Fu

AAAI 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments conducted on 16 real-world datasets demonstrate the effectiveness of the proposed SCE over the traditional clustering and state-of-the-art ensemble clustering methods.
Researcher Affiliation	Academia	Zhiqiang Tao,1 Hongfu Liu,1 Yun Fu1,2 1Department of Electrical and Computer Engineering, Northeastern University, Boston, MA 02115, USA 2College of Computer and Information Science, Northeastern University, Boston, MA 02115, USA
Pseudocode	Yes	Algorithm 1. Simultaneous Clustering and Ensemble Input: X, data points {x1, x2, , xn}, Π, basic partitions {π1, π2, , πr}, K, the number of clusters. Output: final partition π. 1: Build the normalized Laplacian matrix L with X; 2: Calculate the co-association matrix S based on Π; 3: Set H as the smallest K eigenvectors of αL S for LSCE, or the K largest ones of L S for NSCE; 4: Run K-means on H to obtain the final partition π.
Open Source Code	No	The paper does not provide any specific links or explicit statements about making its source code publicly available.
Open Datasets	Yes	The testbed in our experiment mainly consists of 11 benchmark datasets selected from CLUTO2, which is a dataset repository for document clustering (Zhao and Karypis 2002). In addition, ﬁve widely used datasets from other sources, including OFFICE dataset3 (amazon and webcam), Image Net4, pendigits5, and USPS (Cai, Wang, and He 2009), are also employed to evaluate the performance of our method on image clustering. Thus, we totally use 16 real-world datasets with types of text and image in this paper. More details are shown in Table 1.
Dataset Splits	No	The paper does not explicitly provide specific training/test/validation dataset splits (e.g., percentages or sample counts) for reproducibility, nor does it describe cross-validation. It mentions generating basic partitions, but this is distinct from standard data splits.
Hardware Specification	No	The paper does not provide specific hardware details (such as CPU/GPU models or types) used for running its experiments. It only mentions '32 GB' in the context of out-of-memory errors for other methods.
Software Dependencies	No	The paper mentions using 'MATLAB kmeans function' but does not provide specific version numbers for MATLAB or any other software dependencies.
Experiment Setup	Yes	We employ RPS to generate Π of r = 100 BPs as default input for all the EC methods. Each BP is obtained by performing MATLAB kmeans function with a randomly selected cluster number from [K, n]. In addition, we use α = 1 in our LSCE model as the default setting. We test each method 50 times and report the average result.