Invariant and Equivariant Graph Networks
Authors: Haggai Maron, Heli Ben-Hamu, Nadav Shamir, Yaron Lipman
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In the experimental part of the paper we concentrated on possibly the most popular instantiation of graph learning, namely that of a single node set and edge-value data, e.g., with adjacency matrices. We created simple networks by composing our invariant or equivariant linear layers in standard ways and tested the networks in learning invariant and equivariant graph functions: (i) We compared identical networks with our basis and the basis of Hartford et al. (2018) and showed we can learn graph functions like trace, diagonal, and maximal singular vector. The basis in Hartford et al. (2018), tailored to the multi-set setting, cannot learn these functions demonstrating it is not maximal in the graph-learning (i.e., multi-set with repetitions) scenario. We also demonstrate our representation allows extrapolation: learning on one size graphs and testing on another size; (ii) We also tested our networks on a collection of graph learning datasets, achieving results that are comparable to the state-of-the-art in 3 social network datasets. |
| Researcher Affiliation | Academia | Haggai Maron, Heli Ben-Hamu, Nadav Shamir & Yaron Lipman Department of Computer Science and Applied Mathematics Weizmann Institute of Science Rehovot, Israel |
| Pseudocode | No | The paper describes implementation details and mathematical formulas, but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing its source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We use 8 different real world datasets from the benchmark of Yanardag & Vishwanathan (2015): five of these datasets originate from bioinformatics while the other three come from social networks. |
| Dataset Splits | Yes | We follow the evaluation protocol including the 10-fold splits of Zhang et al. (2018). |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper states 'We implemented our method in Tensorflow (Abadi et al., 2016)' but does not provide specific version numbers for TensorFlow or any other software dependencies. |
| Experiment Setup | Yes | We implemented our method in Tensorflow (Abadi et al., 2016). The equivariant linear basis was implemented efficiently using basic row/column/diagonal summation operators, see appendix A for details. The networks we used are composition of 1 4 equivariant linear layers with Re LU activation between them for the equivariant function setting. For invariant function setting we further added a max over the invariant basis and 1 3 fully-connected layers with Re LU activations. We created accordingly 4 datasets with 10K train and 1K test examples of 40 40 matrices; for tasks (i), (ii), (iv) we used i.i.d. random matrices with uniform distribution in [0, 10]; we used mean-squared error (MSE) as loss; for task (iii) we random matrices with uniform distribution of singular values in [0, 0.5] and spectral gap 0.5; due to sign ambiguity in this task we used cosine loss of the form l(x, y) = 1 x/ x , y/ y 2. We trained networks with 1, 2, and 3 hidden layers with 8 feature channels each and a single fullyconnected layer. Both our models as well as Hartford et al. (2018) use the same architecture but with different bases for the linear layers. Table 1 logs the best mean-square error of each method over a set of hyper-parameters. We add the MSE for the trivial mean predictor. We follow the evaluation protocol including the 10-fold splits of Zhang et al. (2018). For each dataset we selected learning and decay rates on one random fold. In all experiments we used a fixed simple architecture of 3 layers with (16, 32, 256) features accordingly. The last equivariant layer is followed by an invariant max layer according to the invariant basis. We then add two fully-connected hidden layers with (512, 256) features. |