What graph neural networks cannot learn: depth vs width
Authors: Andreas Loukas
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Section 5 presents some empirical evidence of the theory. This section aims to empirically test the connection between the capacity dw of a GNNmp, the number of nodes n of its input, and its ability to solve a given task. |
| Researcher Affiliation | Academia | Andreas Loukas Ecole Polytechnique Fédérale Lausanne andreas.loukas@epfl.ch |
| Pseudocode | Yes | Computational model 1 Message passing graph neural network (GNNmp) |
| Open Source Code | No | The paper does not contain any explicit statement about open-sourcing the code or providing a link to a code repository. |
| Open Datasets | No | I generated five distributions over graphs with n (8, 16, 24, 32, 44) nodes and an average diameter of (4, 6, 8, 9, 11), respectively (this was achieved by setting p (6, 8, 10, 12, 14), see Figure 3a). For each such distribution, I generated a training and test set consisting respectively of 1000 and 200 examples. |
| Dataset Splits | No | For each such distribution, I generated a training and test set consisting respectively of 1000 and 200 examples. Both sets were exactly balanced, i.e., any example graph from the training and test set had exactly 50% chance of containing a 4-cycle. |
| Hardware Specification | No | The paper does not specify any hardware details such as GPU models, CPU types, or other computing resources used for experiments. |
| Software Dependencies | No | The GNNmp chosen was that proposed by Xu et al. (2018), with the addition of residual connections this network outperformed all others that I experimented with. The paper mentions using Adam optimizer and learning rate decay, but does not specify version numbers for any software or libraries. |
| Experiment Setup | Yes | To this end, I performed grid search over the hyperparameters w (2, 10, 20) and d (5, 10, 20, 15). To reduce the dependence on the initial conditions and training length, for each hyperparameter combination, I trained 4 networks independently (using Adam and learning rate decay) for 4000 epochs. |