The Geometric Block Model

Authors: Sainyam Galhotra, Arya Mazumdar, Soumyabrata Pal, Barna Saha

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We simulate our results on both real and synthetic datasets to show superior performance of both the new model as well as our algorithm. We experimentally validate the GBM on various real-world datasets. We show that many practical community structures exhibit properties of the GBM. We also compare these features with the corresponding notions in SBM to show how GBM better models data in many practical situations. We propose a simple motif-based efficient algorithm for community detection on the GBM. We rigorously show that this algorithm is optimal up to a constant fraction (to be properly defined later) even in the regime of sparse graphs (average degree log n). The motif-counting algorithms are extensively tested on both synthetic and real-world datasets. They exhibit very good performance in three real datasets, compared to the spectral-clustering algorithm (see Section 5).
Researcher Affiliation Academia Sainyam Galhotra, Arya Mazumdar, Soumyabrata Pal, Barna Saha College of Information and Computer Sciences University of Massachusetts Amherst Amherst, MA 01003 {sainyam,arya,spal,barna}@cs.umass.edu
Pseudocode Yes Algorithm 1: Graph recovery from GBM; Algorithm 2: process
Open Source Code No The paper does not provide a statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes The next dataset that we use in our experiments is the Amazon product metadata on SNAP (https://snap.stanford.edu/data/amazon-meta.html), that has 548552 products and each product is one of the following types {Books, Music CD s, DVD s, Videos}. Political Blogs. (Adamic and Glance 2005) It contains a list of political blogs from 2004 US Election classified as liberal or conservative, and links between the blogs. DBLP. (Yang and Leskovec 2015) The DBLP dataset is a collaboration network where the ground truth communities are defined by the research community. Live Journal. (Leskovec, Adamic, and Huberman 2007) The Live Journal dataset is a free online blogging social network of around 4 million users.
Dataset Splits No The paper describes the datasets used and their sizes (e.g., 'DBLP... size 4500 and 7500', 'Live Journal... sizes 930 and 1400'), but it does not specify explicit training, validation, or test dataset splits with percentages or sample counts.
Hardware Specification No The paper does not specify any hardware details such as GPU/CPU models, processor types, or memory used for running the experiments.
Software Dependencies No The paper mentions 'spectral clustering algorithm using normalized cuts' and provides a URL for scikit-learn, but it does not specify version numbers for any software dependencies.
Experiment Setup Yes Using a somewhat large threshold T1 we sample a subgraph S such that u, v will be in S if there is an edge between u and v, and they have at least T1 common neighbors. We now attempt to recover the subclusters inside this subgraph by following our algorithm with a small threshold T2. Finally, for nodes that are not part of S, say x V \ S, we select each u S that x has an edge with and use a threshold of T3 to decide if u and x should be in the same cluster. The final decision is made by taking a majority vote. We generate synthetic datasets of different sizes according to the GBM with t = 2, k = 2 and for a wide spectrum of values of rs and rd, specifically we focus on the sparse region where rs = a log n / n and rd = b log n / n with variable values of a and b.