Semi-supervised Clustering via Pairwise Constrained Optimal Graph

Authors: Feiping Nie, Han Zhang, Rong Wang, Xuelong Li

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conducted extensive experiments to validate the advantages of the proposed PCOG and the proposed key pairwise constraints selection strategy.
Researcher Affiliation Academia 1School of Computer Science and Center for OPTical IMagery Analysis and Learning (OPTIMAL), Northwestern Polytechnical University, Xi an, 710072, Shaanxi, P. R. China 2School of Cybersecurity, Northwestern Polytechnical University, Xi an, 710072, P. R. China
Pseudocode Yes Algorithm 1 Algorithm to solve problem (4)
Open Source Code No The paper does not provide any specific link to source code or an explicit statement about its availability.
Open Datasets Yes The real world datasets include four UCI datasets [Dua and Graff, 2017] (Dermatology, Control, Monk1 and Glass) and five image datasets (ORL [Samaria and Harter, 1994], COIL20 [Nene et al., 1996], UMIST [Graham and Allinson, 1998], USPS [Hull, 1994] and YALE [Minear and Park, 2004], as described in Table 1.
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) for training, validation, or testing.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment.
Experiment Setup Yes We fix that a quarter of the total pairwise constraints are CL constraints, and the rest are ML constraints. All of these algorithms use the same key pairwise constraint sets consisting of 80 cannot-link constraints and 240 must-link constraints, except for the unsupervised CLR. We first seek γ in a large range of {10 3, 10 2, 10 1, 100, 101, 102, 103}, and we find that our algorithm works well in a small range [0.1, 1]. As a result, we further search γ from 0.1 to 1 with the interval of 0.2... we can obtain the well performance in the range of [0.3, 0.7], and thus we search γ from 0.3 to 0.7 to obtain the best results in other experiments. In terms of λ, as we said, a large enough λ ensures that S possesses c connected components exactly. However, how large λ should be is difficult to seek. Thus, we adopt a widely used manner to determine λ heuristically [Nie et al., 2014]. Specifically, we first initialize λ with a small value like 0.1, and update it according to the number of eigenvalue zero of LS in the iterations.