reproducibilityindex.ai

Streaming, Memory Limited Algorithms for Community Detection

Authors: Se-Young Yun, marc lelarge, Alexandre Proutiere

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this paper, we consider sparse networks consisting of a ﬁnite number of nonoverlapping communities, i.e. disjoint clusters, so that there is higher density within clusters than across clusters. Both the intraand inter-cluster edge densities vanish when the size of the graph grows large, making the cluster reconstruction problem nosier and hence difﬁcult to solve. We are interested in scenarios where the network size is very large, so that the adjacency matrix of the graph is hard to manipulate and store. The data stream model in which columns of the adjacency matrix are revealed sequentially constitutes a natural framework in this setting. For this model, we develop two novel clustering algorithms that extract the clusters asymptotically accurately. The ﬁrst algorithm is ofﬂine, as it needs to store and keep the assignments of nodes to clusters, and requires a memory that scales linearly with the network size. The second algorithm is online, as it may classify a node when the corresponding column is revealed and then discard this information. This algorithm requires a memory growing sub-linearly with the network size. To construct these efﬁcient streaming memory-limited clustering algorithms, we ﬁrst address the problem of clustering with partial information, where only a small proportion of the columns of the adjacency matrix is observed and develop, for this setting, a new spectral algorithm which is of independent interest.
Researcher Affiliation	Collaboration	Se-Young. Yun MSR-Inria 23 Avenue d Italie, Paris 75013 seyoung.yun@inria.fr Marc Lelarge Inria & ENS 23 Avenue d Italie, Paris 75013 marc.lelarge@ens.fr Alexandre Proutiere KTH, EE School / ACL Osquldasv. 10, Stockholm 100-44, Sweden alepro@kth.se
Pseudocode	Yes	Algorithm 1 Spectral method with indirect edges; Algorithm 2 Greedy selections; Algorithm 3 Streaming ofﬂine; Algorithm 4 Streaming online
Open Source Code	No	The paper does not provide any explicit statements about releasing code or links to a code repository for the described methodology.
Open Datasets	No	The paper focuses on theoretical analysis of the Stochastic Block Model (SBM) and does not describe experiments on a publicly available dataset. It mentions SBM as a 'model and benchmark to assess the performance of clustering algorithms' and describes how the graph is generated randomly, indicating a theoretical rather than empirical setup.
Dataset Splits	No	The paper is theoretical and defines a synthetic model (SBM) for graph generation. It does not discuss empirical training, validation, or test splits of any data, as it does not conduct experiments on empirical data.
Hardware Specification	No	The paper does not mention any specific hardware used for experiments. It focuses on theoretical algorithms and their memory requirements.
Software Dependencies	No	The paper describes algorithms and theoretical concepts but does not mention any specific software or library dependencies with version numbers.
Experiment Setup	No	The paper does not provide specific experimental setup details, such as hyperparameter values or training configurations, as it focuses on theoretical algorithm design and analysis.