reproducibilityindex.ai

Robust Online Correlation Clustering

Authors: Silvio Lattanzi, Benjamin Moseley, Sergei Vassilvitskii, Yuyan Wang, Rudy Zhou

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide a novel online algorithm for Correlation Clustering, which is robust to noise and can be applied in real-time streaming settings. We provide theoretical guarantees for the performance of our algorithm and complement our theoretical analysis with an extensive experimental evaluation on synthetic and real-world datasets.
Researcher Affiliation	Academia	Sayan Das, Archan Ray, Somdeb Sarkhel, A. M. Sharan and Saptarshi Ghosh Indian Statistical Institute, Kolkata, India
Pseudocode	Yes	Algorithm 1: ROCC(G)
Open Source Code	No	No explicit statement or link providing access to the open-source code for the described methodology was found.
Open Datasets	Yes	For real-world datasets, we use three popular benchmarks that are widely used in the literature of correlation clustering: Blogs, Cora, and Slashdot. These datasets are publicly available from the Stanford Network Analysis Project (SNAP).
Dataset Splits	Yes	For BlogCatalog, we randomly sample 1000 nodes from the original network and form a graph with 20380 edges and 39 labels. For Cora, we consider the largest connected component of the graph, which consists of 2485 nodes, 5069 edges and 7 labels. For Slashdot, we consider the largest connected component consisting of 77360 nodes, 458999 edges and 20 categories. For synthetic datasets, we generate graphs of varying densities and sizes, ranging from 100 to 1000 nodes.
Hardware Specification	No	No specific hardware details (e.g., CPU/GPU models, memory, cloud instance types) were mentioned.
Software Dependencies	No	No specific software dependencies with version numbers were mentioned.
Experiment Setup	Yes	The parameters (c1, c2) for our algorithm are set to (0.5, 0.5) for the theoretical analysis and kept fixed throughout all experiments. For the synthetic datasets, we vary the number of nodes from 100 to 1000. We average the results over 50 runs.