Learning Convolutional Neural Networks for Graphs

Authors: Mathias Niepert, Mohamed Ahmed, Konstantin Kutzkov

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Using established benchmark data sets, we demonstrate that the learned feature representations are competitive with state of the art graph kernels and that their computation is highly efficient. We conduct three types of experiments: a runtime analysis, a qualitative analysis of the learned features, and a comparison to graph kernels on benchmark data sets.
Researcher Affiliation Industry Mathias Niepert MATHIAS.NIEPERT@NECLAB.EU Mohamed Ahmed MOHAMED.AHMED@NECLAB.EU Konstantin Kutzkov KONSTANTIN.KUTZKOV@NECLAB.EU NEC Labs Europe, Heidelberg, Germany
Pseudocode Yes Algorithm 1 SELNODESEQ: Select Node Sequence, Algorithm 2 NEIGHASSEMB: Neighborhood Assembly, Algorithm 3 RECEPTIVEFIELD: Create Receptive Field, Algorithm 4 NORMALIZEGRAPH: Graph Normalization
Open Source Code No All of the above was implemented with the THEANO (Bergstra et al., 2010) wrapper KERAS (Chollet, 2015). We also applied a logistic regression (PSLR) classifier on the patches for k = 10. The paper provides links to third-party tools (Keras, Graph-tool) but not to its own source code implementation of the proposed methodology.
Open Datasets Yes We use 6 standard benchmark data sets to compare run-time and classification accuracy with state of the art graph kernels: MUTAG, PCT, NCI1, NCI109, PROTEIN, and D&D. Moreover, we ran experiments with the same set-up on larger social graph data sets (up to 12000 graphs each, with an average of 400 nodes), and compared PATCHY-SAN with previously reported results for the graphlet count (GK) and the deep graphlet count kernel (DGK) (Yanardag & Vishwanathan, 2015).
Dataset Splits Yes We performed 10-fold cross-validation with LIB-SVM (Chang & Lin, 2011), using 9 folds for training and 1 for testing, and repeated the experiments 10 times.
Hardware Specification Yes All experiments were run on commodity hardware with 64G RAM and a single 2.8 GHz CPU.
Software Dependencies No All of the above was implemented with the THEANO (Bergstra et al., 2010) wrapper KERAS (Chollet, 2015). We performed 10-fold cross-validation with LIB-SVM (Chang & Lin, 2011). The paper mentions software names but does not provide specific version numbers for replication.
Experiment Setup Yes For PATCHY-SAN (referred to as PSCN), we used 1dimensional WL normalization, a width w equal to the average number of nodes (see Table 1), and receptive field sizes of k = 5 and k = 10. For the experiments we only used node attributes. In addition, we ran experiments for k = 10 where we combined receptive fields for nodes and edges using a merge layer (k = 10E). To make a fair com-parison, we used a single network architecture with two convolutional layers, one dense hidden layer, and a softmax layer for all experiments. The first convolutional layer had 16 output channels (feature maps). The second conv layer has 8 output channels, a stride of s = 1, and a field size of 10. The convolutional layers have rectified linear units. The dense layer has 128 rectified linear units with a dropout rate of 0.5. Dropout and the relatively small number of neurons are needed to avoid overfitting on the smaller data sets. The only hyperparameter we optimized is the number of epochs and the batch size for the mini-batch gradient decent algorithm RMSPROP.