Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Spectral Augmentation for Self-Supervised Learning on Graphs
Authors: Lu Lin, Jinghui Chen, Hongning Wang
ICLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on both graph and node classification tasks demonstrate the effectiveness of our method in unsupervised learning, as well as the generalization capability in transfer learning and the robustness property under adversarial attacks. |
| Researcher Affiliation | Academia | Lu Lin , Jinghui Chen , Hongning Wang The Pennsylvania State University, University of Virginia EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 illustrates the detailed steps of deploying SPAN in an instantiation of GCL. and Algorithm 1: Deploying SPAN in an instantiation of GCL |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code for the methodology described, nor does it provide a link to a code repository. |
| Open Datasets | Yes | The proposed SPAN is evaluated on 25 graph datasets. Specifically, for the node classification task, we included Cora, Citeseer, Pub Med citation networks (Sen et al., 2008), Wiki-CS hyperlink network (Mernyei & Cangea, 2020), Amazon-Computer and Amazon-Photo co-purchase network (Shchur et al., 2018), and Coauthor-CS network (Shchur et al., 2018). For the graph classification and regression tasks, we employed TU biochemical and social networks (Morris et al., 2020), Open Graph Benchmark (OGB) (Hu et al., 2020a) and ZINC (Hu et al., 2020b; Gรณmez-Bombarelli et al., 2018) chemical molecules, and Protein-Protein Interaction (PPI) biological networks (Hu et al., 2020b; Zitnik & Leskovec, 2017). |
| Dataset Splits | Yes | We adopt the given data split for OGB dataset, and use 10-fold cross validation for TU dataset as it does not provide such a split. |
| Hardware Specification | Yes | The experiments were performed on Nvidia Ge Force RTX 2080Ti (12GB) GPU cards for most datasets, and RTX A6000 (48GB) cards for Pub Med and Coauthor-CS datasets. |
| Software Dependencies | No | The paper mentions using PyG (PyTorch Geometric) library for datasets and Adam optimizer, but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | For node representation learning, we used GCN (Kipf & Welling, 2017) encoder, and set the number of GCN layers to 2, the size of hidden dimension for each layer to 512. The training epoch is 1000. For graph representation learning, we adopted GIN (Xu et al., 2019) encoder with 5 layers, which was concatenated by a readout function that adds node representations for pooling. The embedding size was set to 32 for TU dataset and 300 for OBG dataset. We used 100 training epochs with batch size 32. In all the experiments, we used the Adam optimizer with learning rate 0.001 and weight decay 10 5. For data augmentation, we adopted both edge perturbation and feature masking, whose perturbation ratio ฯe and ฯf were tuned by grid search among {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9} based on the validation set. |