SELFIE: Refurbishing Unclean Samples for Robust Deep Learning

Authors: Hwanjun Song, Minseok Kim, Jae-Gil Lee

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To validate the superiority of SELFIE, we conducted extensive experimentation using four real-world or synthetic data sets. The result showed that SELFIE remarkably improved absolute test error compared with two state-of-the-art methods.
Researcher Affiliation Academia 1Graduate School of Knowledge Service Engineering, KAIST, Daejeon, Korea. Correspondence to: Jae-Gil Lee <jaegil@kaist.ac.kr>.
Pseudocode Yes Algorithm 1 SELFIE Algorithm
Open Source Code Yes For reproducibility, we provide the source code at https://github.com/kaist-dmlab/SELFIE.
Open Datasets Yes To validate the superiority of SELFIE, we performed an image classification task on four benchmark data sets: CIFAR-10 (10 classes)4 and CIFAR-100 (100 classes)4, classification of a subset of 80 million categorical images, with 50, 000 training and 10, 000 testing images; Tiny-Image Net (200 classes)5, classification of a subset of Image Net (Krizhevsky et al., 2012), with 100, 000 training and 10, 000 testing images; ANIMAL-10N (10 classes)6, our proprietary real-world noisy data set of human-labeled online images for 10 confusing animals, with 50, 000 training and 5, 000 testing images. Please note that, in ANIMAL10N, noisy labels were injected naturally by human mistakes, where its noise rate was estimated at 8%. It has been released on our site6, and its details can be found in Appendix B (supplementary material).
Dataset Splits No The paper states training and testing image counts for each dataset (e.g., '50,000 training and 10,000 testing images' for CIFAR-10/100, and '100,000 training and 10,000 testing images' for Tiny-Image Net), but does not explicitly provide details for a separate validation split.
Hardware Specification Yes All the algorithms were implemented using Tensor Flow 1.8.07 and executed using a single NVIDIA Tesla V100 GPU.
Software Dependencies Yes All the algorithms were implemented using Tensor Flow 1.8.07
Experiment Setup Yes Network and Hyperparameters: For the classification task, we trained Dense Net (L=25, k=12) and VGG-19 with a momentum optimizer. Specifically, we used a momentum of 0.9, a batch size of 128, a dropout of 0.2 (Srivastava et al., 2014), and batch normalization (Ioffe & Szegedy, 2015). For the training schedule, following the experimental setup of Huang et al. (2017), we trained the network for 100 epochs and used an initial learning rate of 0.1, which was divided by 5 at 50% and 75% of the total number of epochs. Regarding the hyperparameters, we fixed restart to 2 (i.e., restarted Algorithm 1 twice after the first run) and used the best uncertainty threshold ϵ = 0.05 and history length q = 15, which were obtained from a grid ϵ = {0.05, 0.10, 0.15, 0.20} and q = {10, 15, 20}. (See Section 4.5 for details.) The warm-up threshold γ was set to 25 for the initial learning.