Expectation-Complete Graph Representations with Homomorphisms
Authors: Pascal Welke, Maximilian Thiessen, Fabian Jogl, Thomas Gärtner
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical evaluation shows competitive results on several benchmark graph learning tasks. We analyze the performance of our expectation-complete embedding that can be computed in expected polynomial time. The details of the pattern sampling process are described in Appendix C. We evaluate our proposed embeddings in two contexts. We investigate how graph embeddings from message passing graph neural network (GNN) perform when augmented with our embeddings. To complement these results, we investigate the empirical expressive power of our embeddings on synthetic benchmark datasets. |
| Researcher Affiliation | Academia | 1Machine Learning and Artificial Intelligence Lab, University of Bonn, Germany 2Research Unit Machine Learning, TU Wien, Austria 3Center for Artificial Intelligence and Machine Learning, TU Wien, Austria. |
| Pseudocode | Yes | Algorithm 1 Sampling algorithm for a pattern set |
| Open Source Code | Yes | The code to sample patterns and to compute representations4, as well as for the GNN experiments5 is available. Our source code for the pattern sampling and homomorphism counting is available on github7. Pattern sampling and representations: github.com/pwelke/homcount |
| Open Datasets | Yes | We evaluate on the commonly used molecule datasets ZINC, ogbg-molhiv and ogbg-moltox21 (Hu et al., 2020). Table 2 shows averaged accuracies of an SVM classifier trained on our feature sets on the datasets CSL (Murphy et al., 2019) and PAULUS25 (Hoang & Maehara, 2020). |
| Dataset Splits | Yes | We use the provided train/validate/test splits. For ZINC we use the same setup as Bodnar et al. (2021): we use a batch size of 128 and an initial learning rate of 10 3 which we reduce by half every 20 epochs without an improvement of the validation performance. We stop training after either 500 epochs or after the learning rate is smaller than 10 5. To finetune on ZINC, we restart the training procedure with an initial learning rate of 5 10 4. For datasets based on OGB, we train for 100 epochs with a batch size of 32 and a fixed learning rate of 10 3 which corresponds to the initial learning rate on ZINC. To finetune, we train for 100 additional epochs with a learning rate of 5 10 4. |
| Hardware Specification | Yes | We implement our models in Py Torch and Py Torch Geometric and train on a single NVIDIA Ge Force RTX 3080 GPU. |
| Software Dependencies | No | The paper mentions software used, such as 'Py Torch', 'Py Torch Geometric', and 'the C++ code of Curticapean et al. (2017)', but it does not provide specific version numbers for any of these software dependencies. |
| Experiment Setup | Yes | We use a batch size of 128 and an initial learning rate of 10 3 which we reduce by half every 20 epochs without an improvement of the validation performance. We stop training after either 500 epochs or after the learning rate is smaller than 10 5. For datasets based on OGB, we train for 100 epochs with a batch size of 32 and a fixed learning rate of 10 3. Table 4 shows the values of the hyperparameters used for each of the ten datasets. (Table 4 specifies EMBEDDING DIMENSION 300, NUMBER OF GNN LAYERS 5, NUMBER OF MLP LAYERS 2, DROPOUT RATE 0, 0.5, POOLING OPERATION MEAN). |