Dynamic Rescaling for Training GNNs

Authors: Nimrah Mustafa, Rebekka Burkholz

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We primarily study the effect of training GAT in a balanced state based on the relative gradients criterion (see Eq.(4)), by dynamic rescaling on five real-world heterophilic benchmark datasets [32]. We explore our conceptual ideas empirically and find promising directions to utilize dynamic rescaling for more practical benefits, by training in balance or controlling order of learning among network layers.
Researcher Affiliation Academia Nimrah Mustafa CISPA 66123 Saarbrücken, Germany nimrah.mustafa@cispa.de Rebekka Burkholz CISPA 66123 Saarbrücken, Germany burkholz@cispa.de
Pseudocode No The paper describes a procedure for balancing and provides equations (Eq. 6 and 7) but does not present it in a structured pseudocode or algorithm block.
Open Source Code Yes Our experimental code is available at https://github.com/RelationalML/Dynamic_Rescaling_GAT.
Open Datasets Yes We primarily study the effect of training GAT in a balanced state based on the relative gradients criterion (see Eq.(4)), by dynamic rescaling on five real-world heterophilic benchmark datasets [32].
Dataset Splits Yes Given the input graph G with a .75/.25/.25 train/validation/test split, we train a L = k layer GAT network with the same architecture as Mk but initialized with a looks-linear orthogonal structure which ensures that the network must learn the non-linear transformations of the target network.
Hardware Specification Yes Experiments were run on an NVIDIA RTX A6000 GPU with 50GB RAM.
Software Dependencies No The paper mentions 'Adam optimizer' and 'looks-linear orthogonal structure' but does not specify version numbers for any software dependencies or libraries used.
Experiment Setup Yes All experiments use the Adam optimizer and networks are randomly initialized with looks-linear orthogonal structure [36, 1] unless specified otherwise. [...] A maximum of 10 iterations for the rebalancing procedure outlined in Eq. (6) and (7) were used. [...] The best learning rate from {0.01, 0.001, 0.005}.