Robust Online Correlation Clustering
Authors: Silvio Lattanzi, Benjamin Moseley, Sergei Vassilvitskii, Yuyan Wang, Rudy Zhou
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide a novel online algorithm for Correlation Clustering, which is robust to noise and can be applied in real-time streaming settings. We provide theoretical guarantees for the performance of our algorithm and complement our theoretical analysis with an extensive experimental evaluation on synthetic and real-world datasets. |
| Researcher Affiliation | Academia | Sayan Das, Archan Ray, Somdeb Sarkhel, A. M. Sharan and Saptarshi Ghosh Indian Statistical Institute, Kolkata, India |
| Pseudocode | Yes | Algorithm 1: ROCC(G) |
| Open Source Code | No | No explicit statement or link providing access to the open-source code for the described methodology was found. |
| Open Datasets | Yes | For real-world datasets, we use three popular benchmarks that are widely used in the literature of correlation clustering: Blogs, Cora, and Slashdot. These datasets are publicly available from the Stanford Network Analysis Project (SNAP). |
| Dataset Splits | Yes | For BlogCatalog, we randomly sample 1000 nodes from the original network and form a graph with 20380 edges and 39 labels. For Cora, we consider the largest connected component of the graph, which consists of 2485 nodes, 5069 edges and 7 labels. For Slashdot, we consider the largest connected component consisting of 77360 nodes, 458999 edges and 20 categories. For synthetic datasets, we generate graphs of varying densities and sizes, ranging from 100 to 1000 nodes. |
| Hardware Specification | No | No specific hardware details (e.g., CPU/GPU models, memory, cloud instance types) were mentioned. |
| Software Dependencies | No | No specific software dependencies with version numbers were mentioned. |
| Experiment Setup | Yes | The parameters (c1, c2) for our algorithm are set to (0.5, 0.5) for the theoretical analysis and kept fixed throughout all experiments. For the synthetic datasets, we vary the number of nodes from 100 to 1000. We average the results over 50 runs. |