Contextual Graph Markov Model: A Deep and Generative Approach to Graph Processing

Authors: Davide Bacciu, Federico Errica, Alessio Micheli

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The representational capability of our model, referred to as Contextual Graph Markov Model (CGMM), is tested on popular benchmarks for tree classification and on biochemical datasets where compounds are naturally represented as undirected graphs. The section provides an empirical assessment of CGMM s ability in extracting meaningful structural patterns from the data. Since our model is deeply rooted in hidden Markov models for trees, we first test it on tree structured data classification, confronting its performance with that of stateof-the-art probabilistic models and kernels for trees. Then we extend the analysis to more general structures testing CGMM on graph classification tasks.
Researcher Affiliation Academia 1Department of Computer Science, University of Pisa.
Pseudocode No The paper describes the training and inference procedures using textual descriptions and mathematical equations, but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes 1https://github.com/diningphil/CGMM
Open Datasets Yes The INEX2005 and INEX2006 (Denoyer & Gallinari, 2007) are intensively studied benchmarks in this context. To this end, we consider a set of standard graph classification benchmarks from the biochemical domain, that are MUTAG (Debnath et al., 1991), CPDB (Helma et al., 2004) and AIDS (Smola & Vishwanathan, 2003).
Dataset Splits Yes Train and test splits are defined by the benchmark and comprise about 50% of the data each. Model selection decisions have been taken using an hold-out validation set of 20% of the training data. One assesses performance using a single crossvalidation (CV) on the standard 10-fold splits provided for the dataset... The second approach... uses a nested CV, where a 5-fold CV is applied to each training fold of the outer 10-fold: in this case, model selection decisions are taken based on the internal 5-fold CV performance.
Hardware Specification No The paper does not provide specific details regarding the hardware used for running the experiments (e.g., GPU/CPU models, memory specifications).
Software Dependencies No The paper does not provide specific version numbers for software dependencies or libraries used in the experiments.
Experiment Setup Yes SVM hyperparameters Csvm and γsvm have been selected from a limited set in {5, 50, 100}, as their choice had little impact on the model performance. The hidden states size C has been chosen in {20, 40}. A pooling strategy has been used with pool size set to 10.