Automated Data Augmentations for Graph Classification

Authors: Youzhi Luo, Michael Curtis McThrow, Wing Yee Au, Tao Komikado, Kanji Uchino, Koji Maruhashi, Shuiwang Ji

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that Graph Aug outperforms previous graph augmentation methods on various graph classification tasks.
Researcher Affiliation Collaboration 1Texas A&M University, TX, USA 2Fujitsu Research of America, INC., CA, USA 3Fujitsu Research, Fujitsu Limited, Kanagawa, Japan
Pseudocode Yes Algorithm 1: Augmentation Algorithm of Graph Aug
Open Source Code Yes The codes of Graph Aug are available in DIG (Liu et al., 2021) library.
Open Datasets Yes We further demonstrate the advantages of our Graph Aug method over previous graph augmentation methods on six widely used datasets from the TUDatasets benchmark (Morris et al., 2020)... We also conduct experiments on the ogbg-molhiv dataset... from the OGB benchmark (Hu et al., 2020). synthesized by running the open sourced data synthesis code1 of Knyazev et al. (2019). (Footnote 1: https://github.com/bknyaz/graph_attention_pool)
Dataset Splits Yes We use the 10-fold cross-validation scheme with train/validation/test splitting ratios of 80%/10%/10% on the datasets from the TU-Datasets benchmark
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, memory specifications, or cloud instance types used for the experiments.
Software Dependencies No The paper mentions software like the 'DIG (Liu et al., 2021) library' and 'Adam optimizer (Kingma & Ba, 2015)' but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes The Adam optimizer (Kingma & Ba, 2015) is used for the training of all models. For both datasets, we use a reward generation model with 5 layers and the hidden size of 256... The batch size is 32 and the learning rate is 0.0001. For the augmentation model, we use a GIN model with 3 layers and the hidden size of 64 for GNN encoder, an MLP model with 2 layers, the hidden size of 64, and ReLU as the non-linear activation function for MLPC, and an MLP model with 2 layers, the hidden size of 128, and ReLU as the non-linear activation function for MLPM, MLPD, and MLPP. The augmentation model is trained for 5 epochs with the batch size of 32 and the learning rate of 0.0001 on both datasets. To stabilize the training of the augmentation model, we manually control the augmentation model to only modify 5% of graph elements at each augmentation step during the training. On the COLORS dataset, we use a classification model where the number of layers is 3, the hidden size is 128, and the readout layer is max pooling. On the TRIANGLES dataset, we use a classification model where the number of layers is 3, the hidden size is 64, and the readout layer is sum pooling. On both datasets, we set the training batch size as 32 and the learning rate as 0.001 when training classification models, and all classification models are trained for 100 epochs.