$\rm A^2Q$: Aggregation-Aware Quantization for Graph Neural Networks

Authors: Zeyu Zhu, Fanrong Li, Zitao Mo, Qinghao Hu, Gang Li, Zejian Liu, Xiaoyao Liang, Jian Cheng

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on eight public node-level and graph-level datasets demonstrate the generality and robustness of our proposed method.
Researcher Affiliation Collaboration 1Institute of Automation, Chinese Academy of Sciences 2School of Future Technology, University of Chinese Academy of Sciences 3Ai Ri A 4Shanghai Jiao Tong University
Pseudocode Yes Algorithm 1 Nearest Neighbor Strategy 1: Forward Pass (X = (x1, x2, ..., x N)T ): 2: Initialize(s, b), s Rm 1 + , b Rm 1 + before training 3: Calculate qmax = s (2b 1 1) 4: Calculate the maximum absolute value in the features of each node: fi = max j abs(x(j) i ) 5: Search the index of quantization parameters for each node: indexi = arg min k |fi qk max| 6: Quantize the i-th node features using (sindexi, bindexi) 7: return Xq 8: end
Open Source Code Yes We provide our code at this URL: https://github.com/weihai-98/A2Q.
Open Datasets Yes Extensive experiments on eight public node-level and graph-level datasets demonstrate the generality and robustness of our proposed method. Compared to the FP32 models, our method can achieve up to a 18.6x (i.e., 1.70bit) compression ratio with negligible accuracy degradation. Morever, compared to the state-of-the-art quantization method, our method can achieve up to 11.4% and 9.5% accuracy improvements on the node-level and graph-level tasks, respectively, and up to 2x speedup on a dedicated hardware accelerator.
Dataset Splits Yes For REDDIT-BINARY, we use 10-fold cross-validation. For Cora, Cite Seer and Pub Med, we use the splits used by Yang et al. (2016). We use standard splits for MNIST, CIFAR-10, and ZINC (Dwivedi et al., 2020).
Hardware Specification Yes All experiments in our paper ran on RTX 2080Ti GPU driven by Ubuntu 18.04.
Software Dependencies Yes Our method is implemented using Py Torch Geometric (Fey & Lensw, 2019). ... The version of the CUDA and Pytorch are 10.2 and 1.8.0, respectively.
Experiment Setup Yes For a fair comparison, we set the quantization bitwidth of W for all GNNs to 4bits as DQ-INT4. ... For all quantized GNNs, we train them by Adam optimizer. The learning rate and the learning rate schedule are consistent with their FP32 version. ... When initializing, the parameters of the models are initialized as their FP32 version, the quantization bits for all nodes and weight matrixes are initialized by 4bits, and the step sizes for node features and weights are initialized by s N(0.01, 0.01) except for the graph-level tasks on GAT, where we initialize the step size by s U(0, 1). The batch size is 128 in all graph-level tasks.