Deep Insights into Noisy Pseudo Labeling on Graph Data

Authors: Botao WANG, Jia Li, Yang Liu, Jiashun Cheng, Yu Rong, Wenjia Wang, Fugee Tsung

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental extensive experiments demonstrate that the proposed strategy improves graph learning process and outperforms other PL strategies on link prediction and node classification tasks.
Researcher Affiliation Collaboration 1Hong Kong University of Science and Technology, Hong Kong SAR, China 2Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China 3Tencent AI Lab, Shenzhen, China
Pseudocode Yes Algorithm 1: Iterative cautious pseudo labeling.
Open Source Code Yes The implementation is open-sourced at https://github.com/AcEbt/CPL.
Open Datasets Yes We adopt five public available datasets to evaluate CPL strategy for link prediction, i.e. Cite Seer, Actor, Wiki CS, Twitch PT, and Amazon_Photo, and five datasets for node classification, i.e. Cora, Cite Seer, Pub Med, Amazon_Photo, and Last FMAsia. Detailed statistics are reported in Table 1.
Dataset Splits Yes In link prediction task, as there are few PL-based methods, we apply the CPL strategy on three popular models: GAE [12],node2vec [4], SEAL [29] . To reserve sufficient candidate unobserved samples for PL, the dataset is randomly split into 10%,40%,50% for training, validation, and testing.
Hardware Specification No No specific hardware details such as GPU models, CPU types, or memory specifications used for running experiments were mentioned in the paper.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) that would be needed to replicate the experiment.
Experiment Setup No The paper mentions using 5 random seeds, setting k for PL samples, and applying single augmentation methods three times. However, it does not provide specific hyperparameters like learning rate, batch size, number of epochs, or optimizer details for their models.