Graph-based Semi-supervised Local Clustering with Few Labeled Nodes

Authors: Zhaiming Shen, Ming-Jun Lai, Sheng Li

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results on various datasets demonstrate the effectiveness of our approach. Extensive experiments are conducted on various benchmark datasets to show our approach outperforms its counterparts [Lai and Mckenzie, 2020; Lai and Shen, 2023]. Results also show that our approach is favorable than many other state-of-the-art semi-supervised clustering algorithms.
Researcher Affiliation Academia Zhaiming Shen1 , Ming-Jun Lai1 and Sheng Li2 1University of Georgia, Athens, GA, USA 2University of Virginia, Charlottesville, VA, USA {zhaiming.shen, mjlai}@uga.edu, shengli@virginia.edu
Pseudocode Yes Algorithm 1 Compressive Sensing of Local Cluster Extraction (CS-LCE)
Open Source Code Yes We make the supplement and code available at: https://github.com/zzzzms/Local Clustering.
Open Datasets Yes We use simulated stochastic block model, simulated geometric data with three particular shapes, network data on political blogs[Adamic and Glance, 2005], Opt Digits1, AT&T Database of Faces2, MNIST3, and USPS4 as our benchmark datasets. (Footnotes provide URLs for Opt Digits, AT&T Database of Faces, MNIST, USPS, and [Adamic and Glance, 2005] is cited).
Dataset Splits No The paper mentions 'label ratios' for seeds (e.g., 10% in Table 3) but does not provide specific training, validation, or test dataset splits (e.g., '70% training, 15% validation, 15% test') or references to standard splits that define these partitions.
Hardware Specification No No specific hardware details (e.g., CPU/GPU models, memory, or processing units) used for the experiments were provided.
Software Dependencies No No specific software dependencies or versions (e.g., library names with version numbers) were mentioned for reproducibility.
Experiment Setup Yes Parameter: Estimated size ˆn1 |C1|, random walk threshold parameter ϵ (0, 1), random walk depth t Z+, sparsity parameter γ [0.1, 0.5], rejection parameter R [0.1, 0.9] and For convenience, let us fix γ = 0.4 for the rest of discussion. Also: we fix k = 3 and vary n among 600, 1200, 1800, 2400, 3000. We choose p = 5 log n/n, q = log n/n. With five labeled vertices as seeds and we randomly select 10 seeds for each of the cluster.