Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Graph Clustering with Graph Neural Networks
Authors: Anton Tsitsulin, John Palowitch, Bryan Perozzi, Emmanuel Müller
JMLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | An empirical study of performance on synthetic graphs, illustrating the problems with existing work and how DMo N allows for improved model performance in those regimes. Thorough experimental evaluation on real-world data, showing that many pooling methods poorly reflect hierarchical structures and are not able to make use of either graph structure and node attributes nor leverage joint information. |
| Researcher Affiliation | Collaboration | Anton Tsitsulin Google Research, New York, NY, USA EMAIL John Palowitch Google Research, San Francisco, CA, USA EMAIL Bryan Perozzi Google Research, New York, NY, USA EMAIL Emmanuel Müller EMAIL Technical University of Dortmund, Germany |
| Pseudocode | No | The paper describes methods using text and mathematical formulations but does not contain a clearly labeled pseudocode or algorithm block. |
| Open Source Code | Yes | We use open-source graph simulation tools, publicly-available datasets, and we release the implementation of DMo N at this URL1. 1. github.com/google-research/google-research/tree/master/graph_embedding/dmon |
| Open Datasets | Yes | Datasets. We use 11 real-world datasets for assessing model quality. Cora, Citeseer, and Pubmed (Sen et al., 2008) are citation networks... Amazon PC and Amazon Photo (Shchur et al., 2018)... Coauthor CS, Coauthor Phys, Coauthor Med, Coauthor Chem, and Coauthor Eng (Shchur et al., 2018; Shchur and Günnemann, 2019)... OGB-ar Xiv (Hu et al., 2020) is a paper co-citation dataset based on ar Xiv papers indexed by the Microsoft Academic graph. |
| Dataset Splits | No | The paper uses various datasets for evaluation but does not explicitly state training, testing, or validation splits for these datasets. It focuses on presenting overall results without specifying data partitioning methodology. |
| Hardware Specification | No | All models were implemented in TensorFlow 2 and trained on CPUs. This statement is too general, it does not specify CPU models or other hardware details. |
| Software Dependencies | Yes | All models were implemented in TensorFlow 2 and trained on CPUs. |
| Experiment Setup | Yes | Parameter settings. We run both synthetic and real-world experiments for 10 times and average results across runs. All models were implemented in TensorFlow 2 and trained on CPUs. We fix the architecture for all GNNs (including DMo N) have one hidden layer with 512 neurons for real-world data experiments, and 64 neurons for experiments with smaller synthetic graphs. We set the maximum number of clusters to 16 for all datasets and methods... We now discuss method-specific hyperparameters for methods that have specific settings: k-means(Deep Walk, features): We keep the learning parameters as per Perozzi et al. (2014b) number of walks γ = 80, walk length t = 80, and window size w = 10. AGC(graph, features): We set k = 1 to mimic a single-layer graph convolution that we set for all GNN-based methods. DAEGC(graph, features): We set the clustering loss coefficient γ = 10 as per the original paper. SDCN(graph, features): We set the clustering loss coefficient α = 0.1 and GNN loss coefficient β = 0.01 as per the original paper. NOCD(graph, features): We set the dropout to 0.5 uniformly across datasets and set the batch size to 2000 as per the original paper. DMo N (graph, features): We set the dropout to 0.5 uniformly across datasets. |