TEDDY: Trimming Edges with Degree-based Discrimination Strategy

Authors: Hyunjin Seo, Jihun Yun, Eunho Yang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Remarkably, our experimental results demonstrate that TEDDY significantly surpasses conventional iterative approaches in generalization, even when conducting one-shot sparsification that solely utilizes graph structures, without taking feature information into account. Our extensive experiments demonstrate the state-of-the-art performance of TEDDY over iterative GLT methods across diverse benchmark datasets and architectures.
Researcher Affiliation Collaboration Hyunjin Seo1 , Jihun Yun1 , Eunho Yang1,2 Korea Advanced Institute of Science and Technology (KAIST)1, AITRICS2 {bella72,arcprime,eunhoy}@kaist.ac.kr
Pseudocode Yes Algorithm 1 TEDDY: Trimming Edges with Degree-based Discrimination strateg Y
Open Source Code Yes The source code for our experiments is available at https://github.com/hyunjin72/TEDDY.
Open Datasets Yes In alignment with experiments in UGS, we evaluate the performance of our TEDDY on three benchmark datasets: Cora, Citeseer, and Pubmed (Sen et al., 2008) on three representative GNN architectures: GCN (Kipf & Welling, 2016), GIN (Xu et al., 2018a), and GAT (Veliˇckovi c et al., 2017). To further substantiate our analysis, we extend our experiments for two large-scale datasets: Ar Xiv (Hu et al., 2020) and Reddit (Zeng et al., 2019). Table 9 provides comprehensive statistics of the datasets used in our experiments, including the number of nodes, edges, classes, and features. Split ratio Cora 120/500/1000 Citeseer 140/500/1000 Pubmed 60/500/1000 Arxiv 54%/18%/28% Reddit 66%/10%/24%.
Dataset Splits Yes We modified the Cora, Citeseer, and Pubmed datasets (Sen et al., 2008) to a 20/40/40% split for training, validation, and testing phases, ensuring that the model had no prior exposure to the validation and test nodes during training. Table 9 provides comprehensive statistics of the datasets used in our experiments, including the number of nodes, edges, classes, and features. Split ratio Cora 120/500/1000 Citeseer 140/500/1000 Pubmed 60/500/1000 Arxiv 54%/18%/28% Reddit 66%/10%/24%.
Hardware Specification Yes The experiments are conducted on an RTX 2080 Ti (11GB) and RTX 3090 (24GB) GPU machines. The comparison is conducted on the machine with NVIDIA Titan Xp and Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz. The comparison is conducted on the machine with NVIDIA Ge Force RTX 3090 and Intel(R) Xeon(R) Gold 5215 CPU @ 2.50GHz.
Software Dependencies No We implement GNN models and our proposed TEDDY using Py Torch Paszke et al. (2019) and Py Torch Geometric Fey & Lenssen (2019). While specific software is mentioned, version numbers are not provided for PyTorch or PyTorch Geometric, only citations to their respective papers.
Experiment Setup Yes Regarding the experiments on large-scale datasets, we employ three-layer and two-layer GNN on Arxiv and Reddit, respectivly, while fixing the hidden dimension as 256 across both GCN and SAGE. Analogous to the regular-scale experiment, we select the Adam optimizer with an initial learning rate of 0.01 and weight decay as 0 uniformly across all large-scale settings. We adopted per-simulation pruning ratio as pg = pθ = 0.05 and hyperparameter search space for Ldt within the range of [0.01, 200].