reproducibilityindex.ai

Co-teaching: Robust training of deep neural networks with extremely noisy labels

Authors: Bo Han, Quanming Yao, Xingrui Yu, Gang Niu, Miao Xu, Weihua Hu, Ivor Tsang, Masashi Sugiyama

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results on noisy versions of MNIST, CIFAR-10 and CIFAR-100 demonstrate that Co-teaching is much superior to the state-of-the-art methods in the robustness of trained deep models.
Researcher Affiliation	Collaboration	Bo Han 1,2, Quanming Yao 3, Xingrui Yu1, Gang Niu2, Miao Xu2, Weihua Hu4, Ivor W. Tsang1, Masashi Sugiyama2,5 1Centre for Artiﬁcial Intelligence, University of Technology Sydney; 2RIKEN; 34Paradigm Inc.; 4Stanford University; 5University of Tokyo
Pseudocode	Yes	Algorithm 1 Co-teaching Algorithm.
Open Source Code	Yes	The implementation is available at https://github.com/bhanML/Co-teaching.
Open Datasets	Yes	Datasets. We verify the effectiveness of our approach on three benchmark datasets. MNIST, CIFAR10 and CIFAR-100 are used here (Table 1), because these data sets are popularly used for evaluation of noisy labels in the literature [13, 31, 33].
Dataset Splits	No	The paper mentions shuffling the training set and describes a training process with mini-batches, but it does not explicitly define or refer to a validation dataset split used for model selection or hyperparameter tuning within its own experimental setup.
Hardware Specification	Yes	For the fair comparison, we implement all methods with default parameters by Py Torch, and conduct all the experiments on a NIVIDIA K80 GPU.
Software Dependencies	No	The paper mentions implementing methods with "Py Torch", but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup	Yes	For all experiments, Adam optimizer (momentum=0.9) is with an initial learning rate of 0.001, and the batch size is set to 128 and we run 200 epochs. Besides, dropout and batchnormalization are also used. we assume the noise level ϵ is known and set R(T) = 1 τ min (T/Tk, 1) with Tk = 10 and τ = ϵ.