reproducibilityindex.ai

On the Role of Edge Dependency in Graph Generative Models

Authors: Sudhanshu Chanpuriya, Cameron N Musco, Konstantinos Sotiropoulos, Charalampos Tsourakakis

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our evaluation, conducted on real-world datasets, focuses on assessing the output quality and overlap of our proposed models in comparison to other popular models.
Researcher Affiliation	Collaboration	1University of Illinois Urbana-Champaign, Urbana, USA 2University of Massachusetts Amherst, Amherst, USA 3Meta, Menlo Park, United States 4Boston University, Boston, USA.
Pseudocode	Yes	Algorithm 1 Sampling Gp based on max cliques
Open Source Code	No	Our implementation of the methods we introduce is written in Python and uses the Num Py (Harris et al., 2020) and Sci Py (Virtanen et al., 2020) packages. Additionally, to calculate the various graph metrics, we use the following packages: MACE (MAximal Clique Enumerator) (Takeaki, 2012) and Pivoter (Jain & Seshadhri, 2020). We use the following implementation: https://github.com/kiarashza/graphvae-mm. We use the publicly available implementation by the authors of this method: https://github.com/tkipf/gae. The paper describes using other open-source code but does not provide access to the code for their own introduced models (MCEI, MCNI, MCAD).
Open Datasets	Yes	We use the following eight publicly available datasets (together with one synthetic dataset) that we describe next. Table 2 also provides a summary of them. CITESEER (Sen et al., 2008), CORA (Sen et al., 2008), PPI (Stark et al., 2010), POLBLOGS (Adamic & Glance, 2005), WEB-EDU (Gleich et al., 2004), LES MISERABLES (Knuth, 1993), WIKI-ELECT (Leskovec et al., 2010b), FACEBOOK-EGO (Leskovec & Mcauley, 2012), RING OF CLIQUES (synthetic dataset)
Dataset Splits	No	for the other methods, we use early stopping or vary the representation dimensionality. While early stopping is mentioned, there are no specific details on how datasets are split into training, validation, and test sets (e.g., percentages, methodology, or specific files).
Hardware Specification	No	No specific hardware details (e.g., CPU, GPU models, or memory specifications) used for running experiments are mentioned in the paper.
Software Dependencies	No	Our implementation of the methods we introduce is written in Python and uses the Num Py (Harris et al., 2020) and Sci Py (Virtanen et al., 2020) packages. Additionally, to calculate the various graph metrics, we use the following packages: MACE (MAximal Clique Enumerator) (Takeaki, 2012) and Pivoter (Jain & Seshadhri, 2020). While software is mentioned, specific version numbers for Python, NumPy, SciPy, MACE, or Pivoter are not provided.
Experiment Setup	Yes	MCEI, MCNI, MCAD: We range, respectively, the probability p of including an edge of a clique, or node of a clique, or a clique in the sampled graph. We typically use evenly spaced numbers for p over the interval between 0 and 1. For graphs with larger number of maximal cliques (Pol Blogs and PPI), we additionally take the square of these values to decrease the probability. CELL: We set the dimension of the embeddings at 32 for the larger datasets and 8 for the smaller ones. [...] we stop training at every iteration and sample 10 graphs every time the overlap between the generated graphs and the input graph exceeds some value (we set 10 equally spaced thresholds between 0.05 and 0.75). Graph VAE: We use 512 dimensions for the graph-level embedding that precedes the MLP and 32 dimensions for the hidden layers of the graph autoencoder. VGAE: [...] we increase the dimensions of the hidden layers from 4 to 1024 and train for 5, 000 epochs.