Co-teaching: Robust training of deep neural networks with extremely noisy labels
Authors: Bo Han, Quanming Yao, Xingrui Yu, Gang Niu, Miao Xu, Weihua Hu, Ivor Tsang, Masashi Sugiyama
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results on noisy versions of MNIST, CIFAR-10 and CIFAR-100 demonstrate that Co-teaching is much superior to the state-of-the-art methods in the robustness of trained deep models. |
| Researcher Affiliation | Collaboration | Bo Han 1,2, Quanming Yao 3, Xingrui Yu1, Gang Niu2, Miao Xu2, Weihua Hu4, Ivor W. Tsang1, Masashi Sugiyama2,5 1Centre for Artificial Intelligence, University of Technology Sydney; 2RIKEN; 34Paradigm Inc.; 4Stanford University; 5University of Tokyo |
| Pseudocode | Yes | Algorithm 1 Co-teaching Algorithm. |
| Open Source Code | Yes | The implementation is available at https://github.com/bhanML/Co-teaching. |
| Open Datasets | Yes | Datasets. We verify the effectiveness of our approach on three benchmark datasets. MNIST, CIFAR10 and CIFAR-100 are used here (Table 1), because these data sets are popularly used for evaluation of noisy labels in the literature [13, 31, 33]. |
| Dataset Splits | No | The paper mentions shuffling the training set and describes a training process with mini-batches, but it does not explicitly define or refer to a validation dataset split used for model selection or hyperparameter tuning within its own experimental setup. |
| Hardware Specification | Yes | For the fair comparison, we implement all methods with default parameters by Py Torch, and conduct all the experiments on a NIVIDIA K80 GPU. |
| Software Dependencies | No | The paper mentions implementing methods with "Py Torch", but does not provide specific version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | For all experiments, Adam optimizer (momentum=0.9) is with an initial learning rate of 0.001, and the batch size is set to 128 and we run 200 epochs. Besides, dropout and batchnormalization are also used. we assume the noise level ϵ is known and set R(T) = 1 τ min (T/Tk, 1) with Tk = 10 and τ = ϵ. |