CrossSplit: Mitigating Label Noise Memorization through Data Splitting

Authors: Jihye Kim, Aristide Baratin, Yan Zhang, Simon Lacoste-Julien

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on CIFAR-10, CIFAR-100, Tiny Image Net and mini-Web Vision datasets demonstrate that our method can outperform the current state-of-the-art in a wide range of noise ratios. The project page is at https://rlawlgul.github.io/.
Researcher Affiliation Collaboration 1Samsung Advanced Institute of Technology (SAIT), Suwon, South Korea 2Work done as a visiting researcher at SAIT AI Lab, Montreal, Canada 3SAIT AI Lab, Montreal, Canada 4Mila, Universit e de Montreal, Canada 5Canada CIFAR AI Chair.
Pseudocode Yes Algorithm 1 Cross Split: Cross-split SSL training based on cross-split label correction
Open Source Code Yes The project page is at https://rlawlgul.github.io/.
Open Datasets Yes CIFAR-10/100 datasets (Krizhevsky et al., 2009) each contains 50K training and 10K testing 32 × 32 coloured images. Tiny-Image Net (Le & Yang, 2015) is a subset of the Image Net dataset with 100K 64 × 64 coloured images distributed within 200 classes. Mini-Web Vision (Li et al., 2017a) contains 2.4 million images from websites Google and Flicker and contains many naturally noisy labels.
Dataset Splits Yes Tiny-Image Net (Le & Yang, 2015) is a subset of the Image Net dataset with 100K 64 × 64 coloured images distributed within 200 classes. Each class has 500 training images, 50 test images and 50 validation images.
Hardware Specification Yes The following results are on CIFAR-10, reporting seconds / epoch (average of the next 5 epochs after warm-up), run on one RTX8000 GPU.
Software Dependencies No The paper mentions optimizers (SGD) and learning rate schedulers (Cosine Annealing) but does not provide specific version numbers for software libraries, programming languages, or frameworks used (e.g., Python, PyTorch, TensorFlow).
Experiment Setup Yes For CIFAR-10 and CIFAR-100, we train each network using stochastic gradient descent (SGD) optimizer with momentum 0.9 and a weight decay of 0.0005. Training is done for 300 epochs with a batch size of 256. We set the initial learning rate as 0.1 and use a a cosine annealing decay (Loshchilov & Hutter, 2017). Just like in (Li et al., 2020; Karim et al., 2022), a warm-up training on the entire dataset is performed for 10 and 30 epochs for CIFAR-10 and CIFAR-100, respectively.