TAM: Topology-Aware Margin Loss for Class-Imbalanced Node Classification

Authors: Jaeyun Song, Joonhyung Park, Eunho Yang

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our method consistently exhibits superiority over the baselines on various node classification benchmark datasets with representative GNN architectures. We conduct experiments on two well-known node classification benchmark datasets Cite Seer (homophilous graph) and Wisconsin (heterophilous graph) using GCN architecture. In Table 1, we report the averaged balanced accuracy (b Acc.) and F1 score with standard error for the baselines and ours on three homogeneously-connected citation networks.
Researcher Affiliation Collaboration 1Graduate School of AI, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea 2AITRICS, Seoul, South Korea.
Pseudocode Yes Algorithm 1 Topology-Aware Margin
Open Source Code No No explicit statement or link providing concrete access to the source code for the methodology described in this paper was found.
Open Datasets Yes Datasets To show the effectiveness of our algorithm on both homophilous and heterophilous graphs, we evaluate our method on homophilous graphs: Cora, Cite Seer, and Pub Med (Sen et al., 2008), and heterophilous graphs: Wisconsin1, Chameleon, and Squirrel (Rozemberczki et al., 2021).
Dataset Splits Yes We utilize the splits used in Yang et al. (2016) for Cora, Cite Seer, and Pub Med, and in Pei et al. (2019) for Wisconsin, Chameleon, and Squirrel. We search the best architecture based on the average of validation accuracy and F1 score among the number of layers l {1, 2, 3} and the hidden dimension d {64, 128, 256}.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, processor types, or memory amounts) used for running the experiments were provided.
Software Dependencies No All GNNs consist of their own convolutional layers with Re LU activation and dropout (Srivastava et al., 2014) is applied with dropping rate of 0.5 before the last layer. For 1-layer GNNs, we do not adopt dropout and we use multi-head attention with 4 heads for GAT. ... For optimization, we train models for 2000 epochs with Adam optimizer (Kingma & Ba, 2015).
Experiment Setup Yes We search the best architecture based on the average of validation accuracy and F1 among the number of layers l {1, 2, 3} and the hidden dimension d {64, 128, 256}. For optimization, we train models for 2000 epochs with Adam optimizer (Kingma & Ba, 2015). The initial learning rate is set to 0.01 and the learning rate is halved if the validation loss has not improved for 100 iterations. Weight decay is applied to all learnable parameters as 0.0005 except for the last convolutional layer. For our algorithm, we search the best hyperparameters based on the average of validation accuracy and F1 among the coefficient of ACM term α {0.25, 0.5, 1.5, 2.5}, the coefficient of ADM term β {0.125, 0.25, 0.5}, and the minimum temperature of class-wise temperature ϕ {0.8, 1.2}.