Mitigating Label Noise on Graphs via Topological Sample Selection

Authors: Yuhao Wu, Jiangchao Yao, Xiaobo Xia, Jun Yu, Ruxin Wang, Bo Han, Tongliang Liu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we conduct extensive experiments to verify the effectiveness of our method and provide comprehensive ablation studies about the underlying mechanism of TSS.
Researcher Affiliation Collaboration 1Sydney AI Center, The University of Sydney 2CMIC, Shanghai Jiao Tong University 3Shanghai AI Laboratory 4University of Science and Technology of China 5Alibaba Group 6TMLR Group, Department of Computer Science, Hong Kong Baptist University.
Pseudocode Yes We summarize the procedure of TSS in Algorithm 1 of the Appendix.
Open Source Code No The paper does not contain any explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes Datasets We adopted three small datasets including Cora, Cite Seer, and Pub Med, with the default dataset split as did in (Chen et al., 2018), and four large datasets: Wiki CS, Facebook, Physics and DBLP to evaluate our method.
Dataset Splits Yes We adopted three small datasets including Cora, Cite Seer, and Pub Med, with the default dataset split as did in (Chen et al., 2018)... All hyper-parameters are tuned based on a noisy validation set built by leaving 10% noisy training data.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud computing instance types) used for running the experiments.
Software Dependencies No The paper mentions general software components like 'Adam optimizer' and 'two-layer graph convolutional network' but does not specify their version numbers or the versions of any other key software libraries (e.g., Python, PyTorch, TensorFlow, etc.) used for reproducibility.
Experiment Setup Yes A two-layer graph convolutional network whose hidden dimension is 16 is deployed as the backbone for all methods. We apply an Adam optimizer (Kingma and Ba, 2014) with a learning rate of 0.01. The weight decay is set to 5 10 4. The number of pre-training epochs is set to 400. While the number of retraining epochs is set to 500 for Cora, Cite Seer, and 1000 for Pubmed, Wiki CS, Facebook, Physics and DBLP. All hyper-parameters are tuned based on a noisy validation set built by leaving 10% noisy training data.