reproducibilityindex.ai

Query-Efficient Correlation Clustering with Noisy Oracle

Authors: Yuko Kuroki, Atsushi Miyauchi, Francesco Bonchi, Wei Chen

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically validate our theoretical ﬁndings by demonstrating that KC-FC and KC-FB outperform baseline methods in terms of the sample complexity and cost of clustering, respectively.
Researcher Affiliation	Collaboration	Yuko Kuroki CENTAI Institute Turin, Italy yuko.kuroki@centai.eu Atsushi Miyauchi CENTAI Institute Turin, Italy atsushi.miyauchi@centai.eu Francesco Bonchi CENTAI Institute, Turin, Italy Eurecat, Barcelona, Spain bonchi@centai.eu Wei Chen Microsoft Research Beijing, China weic@microsoft.com
Pseudocode	Yes	Algorithm 1 Kwik Cluster with Fixed Conﬁdence (KC-FC), Algorithm 2 Threshold Bandits for indentifying High Similarity pairs with ϵ (0, 0.5) (TB-HS), Algorithm 3 Kwik Cluster with Fixed Budget (KC-FB), Algorithm 4 Kwik Cluster(V, s), Algorithm 5 KC-FC variant with sequential use of TB-HS, Algorithm 6 Uniform sampling in the FC setting (Uniform-FC), Algorithm 7 Uniform sampling in the FB setting (Uniform-FB)
Open Source Code	Yes	The code was written in Python 3, which is available online.3
Open Datasets	Yes	Datasets. We use publicly-available realworld graphs presented in Table 1.
Dataset Splits	No	The paper describes how instances were generated (e.g., varying LB_min, embedding with node2vec) but does not specify explicit training, validation, or test dataset splits (e.g., 80/10/10 percentages or specific sample counts) or cross-validation setup.
Hardware Specification	Yes	The experiments were performed on a machine with Apple M1 Chip and 16 GB RAM.
Software Dependencies	No	The paper mentions 'Python 3' and 'node2vec' but does not provide specific version numbers for these software components to ensure reproducibility (e.g., 'Python 3.x', 'node2vec vX.Y.Z').
Experiment Setup	Yes	In both KC-FC and Uniform-FC, we set ϵ = n allowing each element to make only 1/ n mistakes, and δ = 0.01 following a standard choice in PE-MAB. For each graph, we vary the lower bound on min, which we denote by LB min, in {0.10, 0.15, 0.20, . . . , 0.50}. For large instances with n 1,000, we ﬁx T = n2.2 for scalability.