reproducibilityindex.ai

Correlation Clustering in Data Streams

Authors: KookJin Ahn, Graham Cormode, Sudipto Guha, Andrew McGregor, Anthony Wirth

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this paper, we address the problem of correlation clustering in the dynamic data stream model. ... We present polynomial-time, O(n polylog n)-space approximation algorithms for natural problems that arise. We ﬁrst develop data structures based on linear sketches... We then combine these data structures with convex programming and sampling techniques... Our work presents spaceefﬁcient algorithms for the convex programming required, as well as approaches to reduce the adaptivity of the sampling. ... For max-agree we provide the following single-pass streaming algorithms... (i) a polynomial-time (1 ϵ)-approximation for bounded weights (Theorem 7), and (ii) a 0.766(1 ϵ) approximation for arbitrary weights in O(nϵ 10) time (Theorem 16).
Researcher Affiliation	Collaboration	Kook Jin Ahn1 KOOKJIN@CIS.UPENN.EDU University of Pennsylvania, Philadelphia, PA 19104, USA. Graham Cormode G.CORMODE@WARWICK.AC.UK University of Warwick, Coventry CV4 7AL, UK. Sudipto Guha SUDIPTO@CIS.UPENN.EDU University of Pennsylvania, Philadelphia, PA 19104, USA. Andrew Mc Gregor MCGREGOR@CS.UMASS.EDU University of Massachusetts, Amherst, MA 01003, USA. Anthony Wirth AWIRTH@UNIMELB.EDU.AU Department of Computing and Information Systems, The University of Melbourne, Vic 3010, Australia. 1Currently at Google, Inc. Email:kookjin@google.com
Pseudocode	Yes	Algorithm 1 Oracle for LP2 Algorithm 2 Oracle for SDP.
Open Source Code	No	The paper does not provide any statements about releasing source code or links to repositories for the methodology described.
Open Datasets	No	The paper is theoretical and focuses on algorithm design and approximation bounds. It does not conduct experiments on specific datasets and therefore does not mention publicly available datasets.
Dataset Splits	No	The paper is theoretical and focuses on algorithm design and approximation bounds. It does not involve empirical experiments with data splits for training, validation, or testing.
Hardware Specification	No	The paper is theoretical and focuses on algorithm design and approximation bounds. It does not describe any specific hardware used for experiments.
Software Dependencies	No	The paper is theoretical and focuses on algorithm design and approximation bounds. It does not list any specific software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and focuses on algorithm design and approximation bounds. It does not describe any experimental setup details such as hyperparameters or training configurations.