reproducibilityindex.ai

On Evaluation Metrics for Graph Generative Models

Authors: Rylee Thompson, Boris Knyazev, Elahe Ghalebi, Jungtaek Kim, Graham W. Taylor

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We design experiments to thoroughly test and objectively score metrics on their ability to measure the diversity and ﬁdelity of generated graphs, as well as their sample and computational efﬁciency. To this end, we study existing GGM metrics and neural-network-based metrics emerging from generative models of images that use embeddings extracted from a task-speciﬁc network. Motivated by the power of Graph Neural Networks (GNNs) to extract meaningful graph representations without any training, we introduce several metrics based on the features extracted by an untrained random GNN.
Researcher Affiliation	Collaboration	Rylee Thompson1,2, Boris Knyazev1,2,3, Elahe Ghalebi2, Jungtaek Kim4, Graham W. Taylor1,2 1 University of Guelph, 2 Vector Institute, 3 Samsung, SAIT AI Lab, Montreal, 4 POSTECH
Pseudocode	No	The paper includes mathematical equations and diagrams, but no structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is released at: https://github.com/uoguelph-mlrg/GGM-metrics.
Open Datasets	Yes	We experiment using six diverse graph datasets to test each metric s ability to evaluate GGMs across graph domains (Table 1). In particular, we include common GGM datasets such as Lobster, Grid, Proteins, Community, and Ego (You et al., 2018; Liao et al., 2019; Dai et al., 2020). In addition, we utilize the molecular dataset ZINC (Irwin et al., 2012) strictly to demonstrate the ability of each metric to detect changes in node and edge feature distributions (Section 4.3).
Dataset Splits	Yes	We train GRAN (Liao et al., 2019) and Graph RNN (You et al., 2018) with 80% of the graphs randomly selected for training. We use the implementations in the ofﬁcial Git Hub repositories and train using the recommended hyperparameters. We then generate n graphs from each model where n is the size of the dataset, and use the remaining 20% of graphs as Sr. ... We use the same optimizer across all experiments and select the model with the lowest validation loss.
Hardware Specification	Yes	All experiments in this section were conducted on an Intel Platinum 8160F Skylake @ 2.1Ghz with 4 CPU cores.
Software Dependencies	No	The paper mentions using specific software tools like the “Gra Kel Python library” and “open-source code” from cited works, but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	Thus, in our experiments we consider GIN models (Equations 3 and 4) with L [2, 3, . . . , 7], and d [5, 10, . . . , 40]. We randomly select 20 architectures inside these ranges to test in our experiments using both randomly initialized and pretrained GINs. ... For Precision, Recall, Density, and Coverage, we set k = 5 for all experiments (Naeem et al., 2020). For the MMD RBF metric, we compute MMD as MMD(Sg, Sr) = max{MMD(Sg, Sr; σ) \| σ Σ}, where the values of Σ are multiplied by the mean pairwise distance of Sg and Sr. ... Before this scaling factor is applied, we use Σ = {0.01, 0.1, 0.25, 0.5, 0.75, 1.0, 2.5, 5.0, 7.5, 10.0}. For all baseline metrics from You et al. (2018), we use the hyperparameters chosen in their open-source code.