Query-Efficient Correlation Clustering with Noisy Oracle
Authors: Yuko Kuroki, Atsushi Miyauchi, Francesco Bonchi, Wei Chen
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically validate our theoretical findings by demonstrating that KC-FC and KC-FB outperform baseline methods in terms of the sample complexity and cost of clustering, respectively. |
| Researcher Affiliation | Collaboration | Yuko Kuroki CENTAI Institute Turin, Italy yuko.kuroki@centai.eu Atsushi Miyauchi CENTAI Institute Turin, Italy atsushi.miyauchi@centai.eu Francesco Bonchi CENTAI Institute, Turin, Italy Eurecat, Barcelona, Spain bonchi@centai.eu Wei Chen Microsoft Research Beijing, China weic@microsoft.com |
| Pseudocode | Yes | Algorithm 1 Kwik Cluster with Fixed Confidence (KC-FC), Algorithm 2 Threshold Bandits for indentifying High Similarity pairs with ϵ (0, 0.5) (TB-HS), Algorithm 3 Kwik Cluster with Fixed Budget (KC-FB), Algorithm 4 Kwik Cluster(V, s), Algorithm 5 KC-FC variant with sequential use of TB-HS, Algorithm 6 Uniform sampling in the FC setting (Uniform-FC), Algorithm 7 Uniform sampling in the FB setting (Uniform-FB) |
| Open Source Code | Yes | The code was written in Python 3, which is available online.3 |
| Open Datasets | Yes | Datasets. We use publicly-available realworld graphs presented in Table 1. |
| Dataset Splits | No | The paper describes how instances were generated (e.g., varying LB_min, embedding with node2vec) but does not specify explicit training, validation, or test dataset splits (e.g., 80/10/10 percentages or specific sample counts) or cross-validation setup. |
| Hardware Specification | Yes | The experiments were performed on a machine with Apple M1 Chip and 16 GB RAM. |
| Software Dependencies | No | The paper mentions 'Python 3' and 'node2vec' but does not provide specific version numbers for these software components to ensure reproducibility (e.g., 'Python 3.x', 'node2vec vX.Y.Z'). |
| Experiment Setup | Yes | In both KC-FC and Uniform-FC, we set ϵ = n allowing each element to make only 1/ n mistakes, and δ = 0.01 following a standard choice in PE-MAB. For each graph, we vary the lower bound on min, which we denote by LB min, in {0.10, 0.15, 0.20, . . . , 0.50}. For large instances with n 1,000, we fix T = n2.2 for scalability. |