Learnable Graph Convolutional Attention Networks

Authors: Adrián Javaloy, Pablo Sanchez Martin, Amit Levi, Isabel Valera

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our results demonstrate that L-CAT is able to efficiently combine different GNN layers along the network, outperforming competing methods in a wide range of datasets, and resulting in a more robust model that reduces the need of cross-validating.
Researcher Affiliation Collaboration 1Department of Computer Science of Saarland University, Saarbr ucken, Germany 2Max Planck Institute for Intelligent Systems, T ubingen, Germany 3Huawei Noah s Ark Lab, Montreal, Canada 4Max Planck Institute for Software Systems, Saarbr ucken, Germany
Pseudocode No The paper describes algorithms and models using mathematical equations and textual descriptions but does not include explicit pseudocode blocks or figures labeled 'Algorithm'.
Open Source Code Yes The code to reproduce the experiments can be found at https://github.com/psanch21/LCAT.
Open Datasets Yes All required datasets are freely available.
Dataset Splits Yes We split the datasets in 70 % training, 15 % validation, and 15 % test.
Hardware Specification Yes We used CPU cores to run this set of experiments. In particular, for each trial, we used 2 CPU cores and up to 16 GB of memory. We ran the experiments in parallel using a shared cluster with 10000 CPU cores approximately. For this set of experiments, we had at our disposal a set of 16 Tesla V100-SXM GPUs with 160 CPU cores, shared among the rest of the department.
Software Dependencies No The paper mentions using 'Pytorch Geometric (Fey & Lenssen, 2019)' and 'DGL (Wang et al., 2019a)' implementations, as well as the 'Adam optimizer (Kingma & Ba, 2015)', but it does not specify exact version numbers for these software libraries or tools.
Experiment Setup Yes To ensure the best results, we cross-validate all optimization-related hyperparameters for each model using Graph Gym (You et al., 2020). All models use four GNN layers with hidden size of 32, and thus have an equal number of parameters. For evaluation, we take the best-validation configuration during training, and report test-set performance. For further details, refer to App. D. We cross-validate the number of message-passing layers in the network (2, 3, 4), as well as the learning rate ([0.01, 0.005]).... We use residual connections between the GNN layers, 4 heads in the attention models, and the Parametric Re LU (PRe LU) (He et al., 2015) as the nonlinear activation function. We do not use batch normalization (Ioffe & Szegedy, 2015), nor dropout (Srivastava et al., 2014). We use the Adam optimizer (Kingma & Ba, 2015) with β = (0.9, 0.999), and an exponential learning-rate scheduler with γ = 0.998. We train all the models for 2500 epochs. Importantly, we do not use weight decay, since this will bias the solution towards λ1 = 0 and λ2 = 1. We parametrize λ1 and λ2 as free-parameters in log-space that pass through a sigmoid function i.e., sigmoid(10x) such that they are constrained to the unit interval, and they are learned quickly.