Deep Insights into Noisy Pseudo Labeling on Graph Data
Authors: Botao WANG, Jia Li, Yang Liu, Jiashun Cheng, Yu Rong, Wenjia Wang, Fugee Tsung
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | extensive experiments demonstrate that the proposed strategy improves graph learning process and outperforms other PL strategies on link prediction and node classification tasks. |
| Researcher Affiliation | Collaboration | 1Hong Kong University of Science and Technology, Hong Kong SAR, China 2Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China 3Tencent AI Lab, Shenzhen, China |
| Pseudocode | Yes | Algorithm 1: Iterative cautious pseudo labeling. |
| Open Source Code | Yes | The implementation is open-sourced at https://github.com/AcEbt/CPL. |
| Open Datasets | Yes | We adopt five public available datasets to evaluate CPL strategy for link prediction, i.e. Cite Seer, Actor, Wiki CS, Twitch PT, and Amazon_Photo, and five datasets for node classification, i.e. Cora, Cite Seer, Pub Med, Amazon_Photo, and Last FMAsia. Detailed statistics are reported in Table 1. |
| Dataset Splits | Yes | In link prediction task, as there are few PL-based methods, we apply the CPL strategy on three popular models: GAE [12],node2vec [4], SEAL [29] . To reserve sufficient candidate unobserved samples for PL, the dataset is randomly split into 10%,40%,50% for training, validation, and testing. |
| Hardware Specification | No | No specific hardware details such as GPU models, CPU types, or memory specifications used for running experiments were mentioned in the paper. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) that would be needed to replicate the experiment. |
| Experiment Setup | No | The paper mentions using 5 random seeds, setting k for PL samples, and applying single augmentation methods three times. However, it does not provide specific hyperparameters like learning rate, batch size, number of epochs, or optimizer details for their models. |