reproducibilityindex.ai

GRANOLA: Adaptive Normalization for Graph Neural Networks

Authors: Moshe Eliasof, Beatrice Bevilacqua, Carola-Bibiane Schönlieb, Haggai Maron

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide theoretical results that support our design choices as well as an extensive empirical evaluation demonstrating the superior performance of GRANOLA over existing normalization techniques.
Researcher Affiliation	Collaboration	Moshe Eliasof University of Cambridge me532@cam.ac.uk Beatrice Bevilacqua Purdue University bbevilac@purdue.edu Carola-Bibiane Schönlieb University of Cambridge cbs31@cam.ac.uk Haggai Maron Technion & NVIDIA Research hmaron@nvidia.com
Pseudocode	Yes	Algorithm 1 GRANOLA Layer
Open Source Code	Yes	Our code is available at https://github.com/Moshe Eliasof/GRANOLA.
Open Datasets	Yes	We experiment with the ZINC-12K molecular dataset [57, 28, 21]... We test our GRANOLA on the OGB collection [31]... We experimented with popular datasets from the TUD [42] repository.
Dataset Splits	Yes	We consider the dataset splits proposed in Dwivedi et al. [21]... We consider the scaffold splits proposed in Hu et al. [31]... For all the experiments with datasets from the TUDatasets repository, we followed the evaluation procedure proposed in Xu et al. [63], consisting of 10-fold cross validation and metric at the best averaged validation accuracy across the folds.
Hardware Specification	Yes	We ran our experiments on NVIDIA RTX3090 and RTX4090 GPUs, both having 24GB of memory. ... Specifically, we report the average time per batch measured on a Nvidia RTX-2080 GPU.
Software Dependencies	No	The paper states 'We implemented GRANOLA using Pytorch [50] (BSD-style license) and Pytorch-Geometric [24] (MIT license)' but does not provide specific version numbers for these software components, which is required for reproducibility.
Experiment Setup	Yes	For all models, we used a batch size tuned in {32, 64, 128}. To optimize the model we use the Adam optimizer with initial learning rate of 0.001, which is decayed by 0.5 every 300 epochs. The maximum number of epochs is set to 500. ... The downstream network is composed of a number of layers in {4, 6}, with an embedding dimension tuned in {32, 64}.