On Evaluation Metrics for Graph Generative Models

Authors: Rylee Thompson, Boris Knyazev, Elahe Ghalebi, Jungtaek Kim, Graham W. Taylor

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We design experiments to thoroughly test and objectively score metrics on their ability to measure the diversity and fidelity of generated graphs, as well as their sample and computational efficiency. To this end, we study existing GGM metrics and neural-network-based metrics emerging from generative models of images that use embeddings extracted from a task-specific network. Motivated by the power of Graph Neural Networks (GNNs) to extract meaningful graph representations without any training, we introduce several metrics based on the features extracted by an untrained random GNN.
Researcher Affiliation Collaboration Rylee Thompson1,2, Boris Knyazev1,2,3, Elahe Ghalebi2, Jungtaek Kim4, Graham W. Taylor1,2 1 University of Guelph, 2 Vector Institute, 3 Samsung, SAIT AI Lab, Montreal, 4 POSTECH
Pseudocode No The paper includes mathematical equations and diagrams, but no structured pseudocode or algorithm blocks.
Open Source Code Yes Our code is released at: https://github.com/uoguelph-mlrg/GGM-metrics.
Open Datasets Yes We experiment using six diverse graph datasets to test each metric s ability to evaluate GGMs across graph domains (Table 1). In particular, we include common GGM datasets such as Lobster, Grid, Proteins, Community, and Ego (You et al., 2018; Liao et al., 2019; Dai et al., 2020). In addition, we utilize the molecular dataset ZINC (Irwin et al., 2012) strictly to demonstrate the ability of each metric to detect changes in node and edge feature distributions (Section 4.3).
Dataset Splits Yes We train GRAN (Liao et al., 2019) and Graph RNN (You et al., 2018) with 80% of the graphs randomly selected for training. We use the implementations in the official Git Hub repositories and train using the recommended hyperparameters. We then generate n graphs from each model where n is the size of the dataset, and use the remaining 20% of graphs as Sr. ... We use the same optimizer across all experiments and select the model with the lowest validation loss.
Hardware Specification Yes All experiments in this section were conducted on an Intel Platinum 8160F Skylake @ 2.1Ghz with 4 CPU cores.
Software Dependencies No The paper mentions using specific software tools like the “Gra Kel Python library” and “open-source code” from cited works, but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes Thus, in our experiments we consider GIN models (Equations 3 and 4) with L [2, 3, . . . , 7], and d [5, 10, . . . , 40]. We randomly select 20 architectures inside these ranges to test in our experiments using both randomly initialized and pretrained GINs. ... For Precision, Recall, Density, and Coverage, we set k = 5 for all experiments (Naeem et al., 2020). For the MMD RBF metric, we compute MMD as MMD(Sg, Sr) = max{MMD(Sg, Sr; σ) | σ Σ}, where the values of Σ are multiplied by the mean pairwise distance of Sg and Sr. ... Before this scaling factor is applied, we use Σ = {0.01, 0.1, 0.25, 0.5, 0.75, 1.0, 2.5, 5.0, 7.5, 10.0}. For all baseline metrics from You et al. (2018), we use the hyperparameters chosen in their open-source code.