Spectral Embedding of Regularized Block Models
Authors: Nathan De Lara, Thomas Bonald
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We illustrate these results on both on both synthetic and real data, showing how regularization improves standard clustering scores. Section 5 presents the experiments and Section 6 concludes the paper. |
| Researcher Affiliation | Academia | Nathan De Lara & Thomas Bonald Institut Polytechnique de Paris Paris, France {nathan.delara, thomas.bonald}@telecom-paris.fr |
| Pseudocode | No | The paper contains mathematical formulations and proofs but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code to reproduce these experiments is available online1. 1https://github.com/nathandelara/Spectral-Embedding-of-Regularized-Block-Models/ |
| Open Datasets | Yes | Stochastic Block-Model (SBM) We generate 100 instances of the same stochastic block model (Holland et al., 1983). There are 100 blocks of size 20, with intra-block edge probability set to 0.5 for the first 50 blocks and 0.05 for the other blocks. The inter-block edge probability is set to 0.001 Other sets of parameters can be tested using the code available online. The ground-truth cluster of each node corresponds to its block. 20newsgroup (NG) This dataset consists of around 18000 newsgroups posts on 20 topics. This defines a weighted bipartite graph between documents and words. The label of each document corresponds to the topic. Wikipedia for Schools (WS) (Haruechaiyasak & Damrongrat, 2008). This is the graph of hyperlinks between a subset of Wikipedia pages. The label of each page is its category (e.g., countries, mammals, physics). |
| Dataset Splits | No | The paper describes the use of K-Means for clustering on the datasets and evaluates performance using various metrics. However, it does not specify explicit train/validation/test splits of the datasets themselves for model training or hyperparameter tuning in a traditional sense. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments. |
| Software Dependencies | No | We use the K-Means algorithm with to cluster the nodes in the embedding space. The parameter K is set to the ground-truth number of clusters (other experiments with different values of K are reported in the Appendix). We use the Scikit-learn (Pedregosa et al., 2011) implementation of KMeans and the metrics, when available. The spectral embedding and the modularity are computed with the Scikit-network package, see the documentation for more details2. |
| Experiment Setup | Yes | All graphs are embedded in dimension 20, with different regularization parameters. To compare the impact of this parameter across different datasets, we use a relative regularization parameter (w/n2)α, where w = 1T n A1n is the total weight of the graph. We use the K-Means algorithm with to cluster the nodes in the embedding space. The parameter K is set to the ground-truth number of clusters (other experiments with different values of K are reported in the Appendix). |