SELFIE: Refurbishing Unclean Samples for Robust Deep Learning
Authors: Hwanjun Song, Minseok Kim, Jae-Gil Lee
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To validate the superiority of SELFIE, we conducted extensive experimentation using four real-world or synthetic data sets. The result showed that SELFIE remarkably improved absolute test error compared with two state-of-the-art methods. |
| Researcher Affiliation | Academia | 1Graduate School of Knowledge Service Engineering, KAIST, Daejeon, Korea. Correspondence to: Jae-Gil Lee <jaegil@kaist.ac.kr>. |
| Pseudocode | Yes | Algorithm 1 SELFIE Algorithm |
| Open Source Code | Yes | For reproducibility, we provide the source code at https://github.com/kaist-dmlab/SELFIE. |
| Open Datasets | Yes | To validate the superiority of SELFIE, we performed an image classification task on four benchmark data sets: CIFAR-10 (10 classes)4 and CIFAR-100 (100 classes)4, classification of a subset of 80 million categorical images, with 50, 000 training and 10, 000 testing images; Tiny-Image Net (200 classes)5, classification of a subset of Image Net (Krizhevsky et al., 2012), with 100, 000 training and 10, 000 testing images; ANIMAL-10N (10 classes)6, our proprietary real-world noisy data set of human-labeled online images for 10 confusing animals, with 50, 000 training and 5, 000 testing images. Please note that, in ANIMAL10N, noisy labels were injected naturally by human mistakes, where its noise rate was estimated at 8%. It has been released on our site6, and its details can be found in Appendix B (supplementary material). |
| Dataset Splits | No | The paper states training and testing image counts for each dataset (e.g., '50,000 training and 10,000 testing images' for CIFAR-10/100, and '100,000 training and 10,000 testing images' for Tiny-Image Net), but does not explicitly provide details for a separate validation split. |
| Hardware Specification | Yes | All the algorithms were implemented using Tensor Flow 1.8.07 and executed using a single NVIDIA Tesla V100 GPU. |
| Software Dependencies | Yes | All the algorithms were implemented using Tensor Flow 1.8.07 |
| Experiment Setup | Yes | Network and Hyperparameters: For the classification task, we trained Dense Net (L=25, k=12) and VGG-19 with a momentum optimizer. Specifically, we used a momentum of 0.9, a batch size of 128, a dropout of 0.2 (Srivastava et al., 2014), and batch normalization (Ioffe & Szegedy, 2015). For the training schedule, following the experimental setup of Huang et al. (2017), we trained the network for 100 epochs and used an initial learning rate of 0.1, which was divided by 5 at 50% and 75% of the total number of epochs. Regarding the hyperparameters, we fixed restart to 2 (i.e., restarted Algorithm 1 twice after the first run) and used the best uncertainty threshold ϵ = 0.05 and history length q = 15, which were obtained from a grid ϵ = {0.05, 0.10, 0.15, 0.20} and q = {10, 15, 20}. (See Section 4.5 for details.) The warm-up threshold γ was set to 25 for the initial learning. |