DivideMix: Learning with Noisy Labels as Semi-supervised Learning
Authors: Junnan Li, Richard Socher, Steven C.H. Hoi
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on multiple benchmark datasets demonstrate substantial improvements over state-of-the-art methods. |
| Researcher Affiliation | Industry | Junnan Li, Richard Socher, Steven C.H. Hoi Salesforce Research {junnan.li,rsocher,shoi}@salesforce.com |
| Pseudocode | Yes | Algorithm 1: Divide Mix. |
| Open Source Code | Yes | Code is available at https://github.com/Li Junnan1992/Divide Mix. |
| Open Datasets | Yes | We extensively validate our method on four benchmark datasets, namely CIFAR-10, CIFAR100 (Krizhevsky & Hinton, 2009), Clothing1M (Xiao et al., 2015), and Web Vision (Li et al., 2017a). |
| Dataset Splits | Yes | CIFAR-10 and CIFAR-100 contain 50K training images and 10K test images of size 32 32. We choose λu from {0, 25, 50, 150} using a small validation set. |
| Hardware Specification | Yes | In Table 8, we compare the total training time of Divide Mix on CIFAR-10 with several state-of-the-art methods, using a single Nvidia V100 GPU. |
| Software Dependencies | No | The paper describes the neural network architecture (18-layer Pre Act Resnet) and training optimizers (SGD) along with their parameters, but it does not specify versions for any programming languages or libraries (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | We use an 18-layer Pre Act Resnet (He et al., 2016) and train it using SGD with a momentum of 0.9, a weight decay of 0.0005, and a batch size of 128. The network is trained for 300 epochs. We set the initial learning rate as 0.02, and reduce it by a factor of 10 after 150 epochs. The warm up period is 10 epochs for CIFAR-10 and 30 epochs for CIFAR-100. We find that most hyperparameters introduced by Divide Mix do not need to be heavily tuned. For all CIFAR experiments, we use the same hyperparameters M = 2, T = 0.5, and α = 4. τ is set as 0.5 except for 90% noise ratio when it is set as 0.6. |