Evaluation Metrics for Graph Generative Models: Problems, Pitfalls, and Practical Solutions

Authors: Leslie O'Bray, Max Horn, Bastian Rieck, Karsten Borgwardt

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform a systematic evaluation of MMD in the context of graph generative model comparison, highlighting some of the challenges and pitfalls researchers inadvertently may encounter. After conducting a thorough analysis of the behaviour of MMD on synthetically-generated perturbed graphs as well as on recently-proposed graph generative models, we are able to provide a suitable procedure to mitigate these challenges and pitfalls.
Researcher Affiliation Academia 1Department of Biosystems Science and Engineering, ETH Zürich, Switzerland 2SIB Swiss Institute of Bioinformatics, Switzerland 3Institute of AI for Health, Helmholtz Munich, Germany 4Technical University of Munich, Germany
Pseudocode No The paper describes experimental procedures and methods in paragraph text and mathematical equations, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes We have provided the code for our experiments in order to make our work fully reproducible. Details can be found in Appendix A.2-A.4, which includes a link to our Git Hub repository. During the review process our code is available as a Supplementary Material, in order to preserve anonymity. Our code is available at (https:/www.github. com/Borgwardt Lab/ggme) under a BSD 3-Clause license.
Open Datasets No We ran the models using the author-provided implementations to generate new graphs on the Community, Barabási-Albert, Erdös Rényi, and Watts-Strogatz graph datasets, and then calculated the MMD distance between the generated graphs and the test graphs, using the different (i) kernels that they used (EMD, TV, RBF), (ii) descriptor functions (degree histogram, clustering coefficient histogram, and Laplacian spectrum), and (iii) parameter ranges (σ, λ {10 5, . . . , 105}). The paper mentions common graph types used as datasets (e.g., Barabási-Albert, Erdös-Rényi), but does not provide specific links, DOIs, or formal citations for the particular instances of these datasets used in their experiments.
Dataset Splits No The paper refers to 'test graphs' and 'training graphs' but does not explicitly mention a 'validation' set or provide specific details on the dataset splitting methodology (e.g., percentages or sample counts).
Hardware Specification Yes All the jobs were run on our internal cluster, comprising 64 physical cores (Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz) with 8 Ge Force GTX 1080 GPUs.
Software Dependencies No The paper mentions using 'the official implementations of Graph RNN, GRAN and Graph Score Matching' and provides a link to their own code. However, it does not specify versions for Python, or any specific libraries/frameworks like PyTorch or TensorFlow, which are necessary for full reproducibility of the software environment.
Experiment Setup Yes We ran the models using the author-provided implementations to generate new graphs on the Community, Barabási-Albert, Erdös Rényi, and Watts-Strogatz graph datasets, and then calculated the MMD distance between the generated graphs and the test graphs, using the different (i) kernels that they used (EMD, TV, RBF), (ii) descriptor functions (degree histogram, clustering coefficient histogram, and Laplacian spectrum), and (iii) parameter ranges (σ, λ {10 5, . . . , 105}).