Optimal Sampling and Clustering in the Stochastic Block Model

Authors: Se-Young Yun, Alexandre Proutiere

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show both analytically and numerically that adaptive edge sampling yields important improvements over random sampling (traditionally used in the SBM analysis). For example, we prove that adaptive sampling significantly enlarges the region of the SBM parameters where asymptotically exact cluster recovery is feasible. Figure 1 (right) the cluster recovery rate using ASP (Adaptive Sampling Partition), the proposed optimal joint sampling and clustering algorithm and the optimal clustering algorithm with random sampling presented in [16].
Researcher Affiliation Academia Se-Young Yun KAIST Daejeon, South Korea yunseyoung@kaist.ac.kr Alexandre Proutière KTH Stockholm, Sweden alepro@kth.se
Pseudocode Yes Algorithm 1 Adaptive Spectral Partition(T, δ, K)
Open Source Code No The paper does not provide any explicit statement about open-sourcing the code for the described methodology, nor does it include a link to a code repository.
Open Datasets No The paper describes a theoretical model (Stochastic Block Model) and simulates data based on its parameters for numerical experiments, rather than using a pre-existing, publicly available dataset. Thus, there is no dataset to provide access information for.
Dataset Splits No The paper describes numerical experiments based on the Stochastic Block Model with a sampling budget 'T'. It does not refer to standard training, validation, or test dataset splits in the context of fixed datasets.
Hardware Specification No The paper does not provide any specific details about the hardware used for running the experiments (e.g., GPU/CPU models, memory, cloud resources).
Software Dependencies No The paper mentions 'Figure done using matlab boxplot function' but does not specify the version of Matlab or any other software dependencies with version numbers.
Experiment Setup Yes This experiment concerns the above binary symmetric SBM with 20000 nodes and parameters p11 = p22 = 0.5, and p12 = p21 = 0.1. As soon as the sampling budget exceeds 450000, ASP exactly recovers the clusters... The proposed algorithm consists in three main steps: (i) first, we use a small fraction of the observation budget and spectral methods to obtain initial cluster estimates; (ii) the latter are then used to derive precise estimators of the SBM parameters...; (iii) finally, ˆx and our initial cluster estimates dictate the way to sample edges with the remaining budget, and based on these observations, the cluster estimates are improved.