Simultaneous Clustering and Ensemble
Authors: Zhiqiang Tao, Hongfu Liu, Yun Fu
AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments conducted on 16 real-world datasets demonstrate the effectiveness of the proposed SCE over the traditional clustering and state-of-the-art ensemble clustering methods. |
| Researcher Affiliation | Academia | Zhiqiang Tao,1 Hongfu Liu,1 Yun Fu1,2 1Department of Electrical and Computer Engineering, Northeastern University, Boston, MA 02115, USA 2College of Computer and Information Science, Northeastern University, Boston, MA 02115, USA |
| Pseudocode | Yes | Algorithm 1. Simultaneous Clustering and Ensemble Input: X, data points {x1, x2, , xn}, Π, basic partitions {π1, π2, , πr}, K, the number of clusters. Output: final partition π. 1: Build the normalized Laplacian matrix L with X; 2: Calculate the co-association matrix S based on Π; 3: Set H as the smallest K eigenvectors of αL S for LSCE, or the K largest ones of L S for NSCE; 4: Run K-means on H to obtain the final partition π. |
| Open Source Code | No | The paper does not provide any specific links or explicit statements about making its source code publicly available. |
| Open Datasets | Yes | The testbed in our experiment mainly consists of 11 benchmark datasets selected from CLUTO2, which is a dataset repository for document clustering (Zhao and Karypis 2002). In addition, five widely used datasets from other sources, including OFFICE dataset3 (amazon and webcam), Image Net4, pendigits5, and USPS (Cai, Wang, and He 2009), are also employed to evaluate the performance of our method on image clustering. Thus, we totally use 16 real-world datasets with types of text and image in this paper. More details are shown in Table 1. |
| Dataset Splits | No | The paper does not explicitly provide specific training/test/validation dataset splits (e.g., percentages or sample counts) for reproducibility, nor does it describe cross-validation. It mentions generating basic partitions, but this is distinct from standard data splits. |
| Hardware Specification | No | The paper does not provide specific hardware details (such as CPU/GPU models or types) used for running its experiments. It only mentions '32 GB' in the context of out-of-memory errors for other methods. |
| Software Dependencies | No | The paper mentions using 'MATLAB kmeans function' but does not provide specific version numbers for MATLAB or any other software dependencies. |
| Experiment Setup | Yes | We employ RPS to generate Π of r = 100 BPs as default input for all the EC methods. Each BP is obtained by performing MATLAB kmeans function with a randomly selected cluster number from [K, n]. In addition, we use α = 1 in our LSCE model as the default setting. We test each method 50 times and report the average result. |