TEDDY: Trimming Edges with Degree-based Discrimination Strategy
Authors: Hyunjin Seo, Jihun Yun, Eunho Yang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Remarkably, our experimental results demonstrate that TEDDY significantly surpasses conventional iterative approaches in generalization, even when conducting one-shot sparsification that solely utilizes graph structures, without taking feature information into account. Our extensive experiments demonstrate the state-of-the-art performance of TEDDY over iterative GLT methods across diverse benchmark datasets and architectures. |
| Researcher Affiliation | Collaboration | Hyunjin Seo1 , Jihun Yun1 , Eunho Yang1,2 Korea Advanced Institute of Science and Technology (KAIST)1, AITRICS2 {bella72,arcprime,eunhoy}@kaist.ac.kr |
| Pseudocode | Yes | Algorithm 1 TEDDY: Trimming Edges with Degree-based Discrimination strateg Y |
| Open Source Code | Yes | The source code for our experiments is available at https://github.com/hyunjin72/TEDDY. |
| Open Datasets | Yes | In alignment with experiments in UGS, we evaluate the performance of our TEDDY on three benchmark datasets: Cora, Citeseer, and Pubmed (Sen et al., 2008) on three representative GNN architectures: GCN (Kipf & Welling, 2016), GIN (Xu et al., 2018a), and GAT (Veliˇckovi c et al., 2017). To further substantiate our analysis, we extend our experiments for two large-scale datasets: Ar Xiv (Hu et al., 2020) and Reddit (Zeng et al., 2019). Table 9 provides comprehensive statistics of the datasets used in our experiments, including the number of nodes, edges, classes, and features. Split ratio Cora 120/500/1000 Citeseer 140/500/1000 Pubmed 60/500/1000 Arxiv 54%/18%/28% Reddit 66%/10%/24%. |
| Dataset Splits | Yes | We modified the Cora, Citeseer, and Pubmed datasets (Sen et al., 2008) to a 20/40/40% split for training, validation, and testing phases, ensuring that the model had no prior exposure to the validation and test nodes during training. Table 9 provides comprehensive statistics of the datasets used in our experiments, including the number of nodes, edges, classes, and features. Split ratio Cora 120/500/1000 Citeseer 140/500/1000 Pubmed 60/500/1000 Arxiv 54%/18%/28% Reddit 66%/10%/24%. |
| Hardware Specification | Yes | The experiments are conducted on an RTX 2080 Ti (11GB) and RTX 3090 (24GB) GPU machines. The comparison is conducted on the machine with NVIDIA Titan Xp and Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz. The comparison is conducted on the machine with NVIDIA Ge Force RTX 3090 and Intel(R) Xeon(R) Gold 5215 CPU @ 2.50GHz. |
| Software Dependencies | No | We implement GNN models and our proposed TEDDY using Py Torch Paszke et al. (2019) and Py Torch Geometric Fey & Lenssen (2019). While specific software is mentioned, version numbers are not provided for PyTorch or PyTorch Geometric, only citations to their respective papers. |
| Experiment Setup | Yes | Regarding the experiments on large-scale datasets, we employ three-layer and two-layer GNN on Arxiv and Reddit, respectivly, while fixing the hidden dimension as 256 across both GCN and SAGE. Analogous to the regular-scale experiment, we select the Adam optimizer with an initial learning rate of 0.01 and weight decay as 0 uniformly across all large-scale settings. We adopted per-simulation pruning ratio as pg = pθ = 0.05 and hyperparameter search space for Ldt within the range of [0.01, 200]. |