Query-Efficient Correlation Clustering with Noisy Oracle

Authors: Yuko Kuroki, Atsushi Miyauchi, Francesco Bonchi, Wei Chen

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically validate our theoretical findings by demonstrating that KC-FC and KC-FB outperform baseline methods in terms of the sample complexity and cost of clustering, respectively.
Researcher Affiliation Collaboration Yuko Kuroki CENTAI Institute Turin, Italy yuko.kuroki@centai.eu Atsushi Miyauchi CENTAI Institute Turin, Italy atsushi.miyauchi@centai.eu Francesco Bonchi CENTAI Institute, Turin, Italy Eurecat, Barcelona, Spain bonchi@centai.eu Wei Chen Microsoft Research Beijing, China weic@microsoft.com
Pseudocode Yes Algorithm 1 Kwik Cluster with Fixed Confidence (KC-FC), Algorithm 2 Threshold Bandits for indentifying High Similarity pairs with ϵ (0, 0.5) (TB-HS), Algorithm 3 Kwik Cluster with Fixed Budget (KC-FB), Algorithm 4 Kwik Cluster(V, s), Algorithm 5 KC-FC variant with sequential use of TB-HS, Algorithm 6 Uniform sampling in the FC setting (Uniform-FC), Algorithm 7 Uniform sampling in the FB setting (Uniform-FB)
Open Source Code Yes The code was written in Python 3, which is available online.3
Open Datasets Yes Datasets. We use publicly-available realworld graphs presented in Table 1.
Dataset Splits No The paper describes how instances were generated (e.g., varying LB_min, embedding with node2vec) but does not specify explicit training, validation, or test dataset splits (e.g., 80/10/10 percentages or specific sample counts) or cross-validation setup.
Hardware Specification Yes The experiments were performed on a machine with Apple M1 Chip and 16 GB RAM.
Software Dependencies No The paper mentions 'Python 3' and 'node2vec' but does not provide specific version numbers for these software components to ensure reproducibility (e.g., 'Python 3.x', 'node2vec vX.Y.Z').
Experiment Setup Yes In both KC-FC and Uniform-FC, we set ϵ = n allowing each element to make only 1/ n mistakes, and δ = 0.01 following a standard choice in PE-MAB. For each graph, we vary the lower bound on min, which we denote by LB min, in {0.10, 0.15, 0.20, . . . , 0.50}. For large instances with n 1,000, we fix T = n2.2 for scalability.