CrossSplit: Mitigating Label Noise Memorization through Data Splitting
Authors: Jihye Kim, Aristide Baratin, Yan Zhang, Simon Lacoste-Julien
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on CIFAR-10, CIFAR-100, Tiny Image Net and mini-Web Vision datasets demonstrate that our method can outperform the current state-of-the-art in a wide range of noise ratios. The project page is at https://rlawlgul.github.io/. |
| Researcher Affiliation | Collaboration | 1Samsung Advanced Institute of Technology (SAIT), Suwon, South Korea 2Work done as a visiting researcher at SAIT AI Lab, Montreal, Canada 3SAIT AI Lab, Montreal, Canada 4Mila, Universit e de Montreal, Canada 5Canada CIFAR AI Chair. |
| Pseudocode | Yes | Algorithm 1 Cross Split: Cross-split SSL training based on cross-split label correction |
| Open Source Code | Yes | The project page is at https://rlawlgul.github.io/. |
| Open Datasets | Yes | CIFAR-10/100 datasets (Krizhevsky et al., 2009) each contains 50K training and 10K testing 32 × 32 coloured images. Tiny-Image Net (Le & Yang, 2015) is a subset of the Image Net dataset with 100K 64 × 64 coloured images distributed within 200 classes. Mini-Web Vision (Li et al., 2017a) contains 2.4 million images from websites Google and Flicker and contains many naturally noisy labels. |
| Dataset Splits | Yes | Tiny-Image Net (Le & Yang, 2015) is a subset of the Image Net dataset with 100K 64 × 64 coloured images distributed within 200 classes. Each class has 500 training images, 50 test images and 50 validation images. |
| Hardware Specification | Yes | The following results are on CIFAR-10, reporting seconds / epoch (average of the next 5 epochs after warm-up), run on one RTX8000 GPU. |
| Software Dependencies | No | The paper mentions optimizers (SGD) and learning rate schedulers (Cosine Annealing) but does not provide specific version numbers for software libraries, programming languages, or frameworks used (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | For CIFAR-10 and CIFAR-100, we train each network using stochastic gradient descent (SGD) optimizer with momentum 0.9 and a weight decay of 0.0005. Training is done for 300 epochs with a batch size of 256. We set the initial learning rate as 0.1 and use a a cosine annealing decay (Loshchilov & Hutter, 2017). Just like in (Li et al., 2020; Karim et al., 2022), a warm-up training on the entire dataset is performed for 10 and 30 epochs for CIFAR-10 and CIFAR-100, respectively. |