Spectral Embedding of Regularized Block Models

Authors: Nathan De Lara, Thomas Bonald

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We illustrate these results on both on both synthetic and real data, showing how regularization improves standard clustering scores. Section 5 presents the experiments and Section 6 concludes the paper.
Researcher Affiliation Academia Nathan De Lara & Thomas Bonald Institut Polytechnique de Paris Paris, France {nathan.delara, thomas.bonald}@telecom-paris.fr
Pseudocode No The paper contains mathematical formulations and proofs but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes The code to reproduce these experiments is available online1. 1https://github.com/nathandelara/Spectral-Embedding-of-Regularized-Block-Models/
Open Datasets Yes Stochastic Block-Model (SBM) We generate 100 instances of the same stochastic block model (Holland et al., 1983). There are 100 blocks of size 20, with intra-block edge probability set to 0.5 for the first 50 blocks and 0.05 for the other blocks. The inter-block edge probability is set to 0.001 Other sets of parameters can be tested using the code available online. The ground-truth cluster of each node corresponds to its block. 20newsgroup (NG) This dataset consists of around 18000 newsgroups posts on 20 topics. This defines a weighted bipartite graph between documents and words. The label of each document corresponds to the topic. Wikipedia for Schools (WS) (Haruechaiyasak & Damrongrat, 2008). This is the graph of hyperlinks between a subset of Wikipedia pages. The label of each page is its category (e.g., countries, mammals, physics).
Dataset Splits No The paper describes the use of K-Means for clustering on the datasets and evaluates performance using various metrics. However, it does not specify explicit train/validation/test splits of the datasets themselves for model training or hyperparameter tuning in a traditional sense.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies No We use the K-Means algorithm with to cluster the nodes in the embedding space. The parameter K is set to the ground-truth number of clusters (other experiments with different values of K are reported in the Appendix). We use the Scikit-learn (Pedregosa et al., 2011) implementation of KMeans and the metrics, when available. The spectral embedding and the modularity are computed with the Scikit-network package, see the documentation for more details2.
Experiment Setup Yes All graphs are embedded in dimension 20, with different regularization parameters. To compare the impact of this parameter across different datasets, we use a relative regularization parameter (w/n2)α, where w = 1T n A1n is the total weight of the graph. We use the K-Means algorithm with to cluster the nodes in the embedding space. The parameter K is set to the ground-truth number of clusters (other experiments with different values of K are reported in the Appendix).