Correlation Clustering in Data Streams

Authors: KookJin Ahn, Graham Cormode, Sudipto Guha, Andrew McGregor, Anthony Wirth

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In this paper, we address the problem of correlation clustering in the dynamic data stream model. ... We present polynomial-time, O(n polylog n)-space approximation algorithms for natural problems that arise. We first develop data structures based on linear sketches... We then combine these data structures with convex programming and sampling techniques... Our work presents spaceefficient algorithms for the convex programming required, as well as approaches to reduce the adaptivity of the sampling. ... For max-agree we provide the following single-pass streaming algorithms... (i) a polynomial-time (1 ϵ)-approximation for bounded weights (Theorem 7), and (ii) a 0.766(1 ϵ) approximation for arbitrary weights in O(nϵ 10) time (Theorem 16).
Researcher Affiliation Collaboration Kook Jin Ahn1 KOOKJIN@CIS.UPENN.EDU University of Pennsylvania, Philadelphia, PA 19104, USA. Graham Cormode G.CORMODE@WARWICK.AC.UK University of Warwick, Coventry CV4 7AL, UK. Sudipto Guha SUDIPTO@CIS.UPENN.EDU University of Pennsylvania, Philadelphia, PA 19104, USA. Andrew Mc Gregor MCGREGOR@CS.UMASS.EDU University of Massachusetts, Amherst, MA 01003, USA. Anthony Wirth AWIRTH@UNIMELB.EDU.AU Department of Computing and Information Systems, The University of Melbourne, Vic 3010, Australia. 1Currently at Google, Inc. Email:kookjin@google.com
Pseudocode Yes Algorithm 1 Oracle for LP2 Algorithm 2 Oracle for SDP.
Open Source Code No The paper does not provide any statements about releasing source code or links to repositories for the methodology described.
Open Datasets No The paper is theoretical and focuses on algorithm design and approximation bounds. It does not conduct experiments on specific datasets and therefore does not mention publicly available datasets.
Dataset Splits No The paper is theoretical and focuses on algorithm design and approximation bounds. It does not involve empirical experiments with data splits for training, validation, or testing.
Hardware Specification No The paper is theoretical and focuses on algorithm design and approximation bounds. It does not describe any specific hardware used for experiments.
Software Dependencies No The paper is theoretical and focuses on algorithm design and approximation bounds. It does not list any specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and focuses on algorithm design and approximation bounds. It does not describe any experimental setup details such as hyperparameters or training configurations.