Correlation Clustering in Data Streams
Authors: KookJin Ahn, Graham Cormode, Sudipto Guha, Andrew McGregor, Anthony Wirth
ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we address the problem of correlation clustering in the dynamic data stream model. ... We present polynomial-time, O(n polylog n)-space approximation algorithms for natural problems that arise. We first develop data structures based on linear sketches... We then combine these data structures with convex programming and sampling techniques... Our work presents spaceefficient algorithms for the convex programming required, as well as approaches to reduce the adaptivity of the sampling. ... For max-agree we provide the following single-pass streaming algorithms... (i) a polynomial-time (1 ϵ)-approximation for bounded weights (Theorem 7), and (ii) a 0.766(1 ϵ) approximation for arbitrary weights in O(nϵ 10) time (Theorem 16). |
| Researcher Affiliation | Collaboration | Kook Jin Ahn1 KOOKJIN@CIS.UPENN.EDU University of Pennsylvania, Philadelphia, PA 19104, USA. Graham Cormode G.CORMODE@WARWICK.AC.UK University of Warwick, Coventry CV4 7AL, UK. Sudipto Guha SUDIPTO@CIS.UPENN.EDU University of Pennsylvania, Philadelphia, PA 19104, USA. Andrew Mc Gregor MCGREGOR@CS.UMASS.EDU University of Massachusetts, Amherst, MA 01003, USA. Anthony Wirth AWIRTH@UNIMELB.EDU.AU Department of Computing and Information Systems, The University of Melbourne, Vic 3010, Australia. 1Currently at Google, Inc. Email:kookjin@google.com |
| Pseudocode | Yes | Algorithm 1 Oracle for LP2 Algorithm 2 Oracle for SDP. |
| Open Source Code | No | The paper does not provide any statements about releasing source code or links to repositories for the methodology described. |
| Open Datasets | No | The paper is theoretical and focuses on algorithm design and approximation bounds. It does not conduct experiments on specific datasets and therefore does not mention publicly available datasets. |
| Dataset Splits | No | The paper is theoretical and focuses on algorithm design and approximation bounds. It does not involve empirical experiments with data splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and focuses on algorithm design and approximation bounds. It does not describe any specific hardware used for experiments. |
| Software Dependencies | No | The paper is theoretical and focuses on algorithm design and approximation bounds. It does not list any specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and focuses on algorithm design and approximation bounds. It does not describe any experimental setup details such as hyperparameters or training configurations. |