Provably End-to-end Label-noise Learning without Anchor Points

Authors: Xuefeng Li, Tongliang Liu, Bo Han, Gang Niu, Masashi Sugiyama

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on benchmark datasets demonstrate the effectiveness and robustness of the proposed method.
Researcher Affiliation Academia 1University of New South Wales 2Trustworthy Machine Learning Lab, University of Sydney 3Hong Kong Baptist University 4RIKEN AIP 5University of Tokyo
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks. The method is described in prose.
Open Source Code No The paper does not provide any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We evaluate the proposed method on three synthetic noisy datasets, i.e., MNIST, CIFAR-10 and CIFAR100 and one real-world noisy dataset, i.e., clothing1M.
Dataset Splits Yes We leave out 10% of the training examples as the validation set. and we leave out 10% of the noisy training examples as a noisy validation set for model selection.
Hardware Specification Yes For a fair comparison, we implement all methods with default parameters by Py Torch on Tesla V100-SXM2.
Software Dependencies No The paper mentions 'Py Torch' but does not specify a version number or other software dependencies with version details.
Experiment Setup Yes For MNIST, we use a Le Net-5 network. SGD is used to train the classification network hθ with batch size 128, momentum 0.9, weight decay 103 and a learning rate 10 2. Adam with default parameters is used to train the transition matrix ˆT . The algorithm is run for 60 epoch. For CIFAR10, we use a Res Net-18 network. SGD is used to train both the classification network hθ and the transition matrix ˆT with batch size 128, momentum 0.9, weight decay 103 and an initial learning rate 10 2. The algorithm is run for 150 epoch and the learning rate is divided by 10 after the 30th and 60th epoch. For CIFAR100, we use a Res Net-32 network. SGD is used to train the classification network hθ with batch size 128, momentum 0.9, weight decay 103 and an initial learning rate 10 2. Adam with default parameters is used to train the transition matrix ˆT . The algorithm is run for 150 epoch and the learning rate is divided by 10 after the 30th and 60th epoch. For CIFAR-10 and CIFAR-100, we perform data augmentation by horizontal random flips and 32 32 random crops after padding 4 pixels on each side. For clothing1M, we use a Res Net-50 pre-trained on Image Net. We only use the 1M noisy data to train and validate the network. For the optimization, SGD is used train both the classification network hθ and the transition matrix ˆT with momentum 0.9, weight decay 103, batch size 32, and run with learning rates 2 103 and 2 105 for 5 epochs each.