reproducibilityindex.ai

TEDDY: Trimming Edges with Degree-based Discrimination Strategy

Authors: Hyunjin Seo, Jihun Yun, Eunho Yang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Remarkably, our experimental results demonstrate that TEDDY significantly surpasses conventional iterative approaches in generalization, even when conducting one-shot sparsification that solely utilizes graph structures, without taking feature information into account. Our extensive experiments demonstrate the state-of-the-art performance of TEDDY over iterative GLT methods across diverse benchmark datasets and architectures.
Researcher Affiliation	Collaboration	Hyunjin Seo1 , Jihun Yun1 , Eunho Yang1,2 Korea Advanced Institute of Science and Technology (KAIST)1, AITRICS2 {bella72,arcprime,eunhoy}@kaist.ac.kr
Pseudocode	Yes	Algorithm 1 TEDDY: Trimming Edges with Degree-based Discrimination strateg Y
Open Source Code	Yes	The source code for our experiments is available at https://github.com/hyunjin72/TEDDY.
Open Datasets	Yes	In alignment with experiments in UGS, we evaluate the performance of our TEDDY on three benchmark datasets: Cora, Citeseer, and Pubmed (Sen et al., 2008) on three representative GNN architectures: GCN (Kipf & Welling, 2016), GIN (Xu et al., 2018a), and GAT (Veliˇckovi c et al., 2017). To further substantiate our analysis, we extend our experiments for two large-scale datasets: Ar Xiv (Hu et al., 2020) and Reddit (Zeng et al., 2019). Table 9 provides comprehensive statistics of the datasets used in our experiments, including the number of nodes, edges, classes, and features. Split ratio Cora 120/500/1000 Citeseer 140/500/1000 Pubmed 60/500/1000 Arxiv 54%/18%/28% Reddit 66%/10%/24%.
Dataset Splits	Yes	We modified the Cora, Citeseer, and Pubmed datasets (Sen et al., 2008) to a 20/40/40% split for training, validation, and testing phases, ensuring that the model had no prior exposure to the validation and test nodes during training. Table 9 provides comprehensive statistics of the datasets used in our experiments, including the number of nodes, edges, classes, and features. Split ratio Cora 120/500/1000 Citeseer 140/500/1000 Pubmed 60/500/1000 Arxiv 54%/18%/28% Reddit 66%/10%/24%.
Hardware Specification	Yes	The experiments are conducted on an RTX 2080 Ti (11GB) and RTX 3090 (24GB) GPU machines. The comparison is conducted on the machine with NVIDIA Titan Xp and Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz. The comparison is conducted on the machine with NVIDIA Ge Force RTX 3090 and Intel(R) Xeon(R) Gold 5215 CPU @ 2.50GHz.
Software Dependencies	No	We implement GNN models and our proposed TEDDY using Py Torch Paszke et al. (2019) and Py Torch Geometric Fey & Lenssen (2019). While specific software is mentioned, version numbers are not provided for PyTorch or PyTorch Geometric, only citations to their respective papers.
Experiment Setup	Yes	Regarding the experiments on large-scale datasets, we employ three-layer and two-layer GNN on Arxiv and Reddit, respectivly, while fixing the hidden dimension as 256 across both GCN and SAGE. Analogous to the regular-scale experiment, we select the Adam optimizer with an initial learning rate of 0.01 and weight decay as 0 uniformly across all large-scale settings. We adopted per-simulation pruning ratio as pg = pθ = 0.05 and hyperparameter search space for Ldt within the range of [0.01, 200].