Part-dependent Label Noise: Towards Instance-dependent Label Noise

Authors: Xiaobo Xia, Tongliang Liu, Bo Han, Nannan Wang, Mingming Gong, Haifeng Liu, Gang Niu, Dacheng Tao, Masashi Sugiyama

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluations on synthetic and real-world datasets demonstrate our method is superior to the state-of-the-art approaches for learning from the instance-dependent label noise. Extensive experiments on both synthetic and real-world label-noise datasets show that the part-dependent transition matrices can well address instance-dependent label noise.
Researcher Affiliation Collaboration Xiaobo Xia1,2 Tongliang Liu1 Bo Han3 Nannan Wang2 Mingming Gong4 Haifeng Liu5 Gang Niu6 Dacheng Tao1 Masashi Sugiyama6,7 1University of Sydney 2Xidian University 3Hong Kong Baptist University 4University of Melbourne 5Brain-Inspired Technology Co., Ltd 6RIKEN 7University of Tokyo
Pseudocode Yes Algorithm 1 Part-dependent Matrices Learning Algorithm.
Open Source Code Yes Our implementation is available at https://github.com/xiaoboxia/Part-dependent-label-noise.
Open Datasets Yes F-MNIST [68], SVHN [45], CIFAR-10 [26], NEWS [27], and one real-world noisy dataset, i.e., Clothing1M [69]. For all the datasets, we leave out 10% of the noisy training examples as a noisy validation set, which is for model selection. We also conduct synthetic experiments on MNIST [28].
Dataset Splits Yes For all the datasets, we leave out 10% of the noisy training examples as a noisy validation set, which is for model selection. F-MNIST contains 60,000 training images and 10,000 test images with 10 classes. SVHN and CIFAR-10 both have 10 classes of images, but the former contains 73,257 training images and 26,032 test images, and the latter contains 50,000 training images and 10,000 test images. NEWS contains 13,997 training texts and 6,000 test texts with 20 classes.
Hardware Specification Yes For fair comparison, all experiments are conducted on NVIDIA Tesla V100, and all methods are implemented by Py Torch.
Software Dependencies No The paper mentions 'Py Torch' as the implementation framework but does not provide specific version numbers for it or any other software dependencies.
Experiment Setup Yes We first use SGD with momentum 0.9, weight decay 10 4, batch size 128, and an initial learning rate of 10 2 to initialize the network. The learning rate is divided by 10 at the 40th epochs and 80th epochs. We set 100 epochs in total. Then, the optimizer and learning rate are changed to Adam and 5 10 7 to learn the classifier and slack variable.