reproducibilityindex.ai

SELFIE: Refurbishing Unclean Samples for Robust Deep Learning

Authors: Hwanjun Song, Minseok Kim, Jae-Gil Lee

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To validate the superiority of SELFIE, we conducted extensive experimentation using four real-world or synthetic data sets. The result showed that SELFIE remarkably improved absolute test error compared with two state-of-the-art methods.
Researcher Affiliation	Academia	1Graduate School of Knowledge Service Engineering, KAIST, Daejeon, Korea. Correspondence to: Jae-Gil Lee <jaegil@kaist.ac.kr>.
Pseudocode	Yes	Algorithm 1 SELFIE Algorithm
Open Source Code	Yes	For reproducibility, we provide the source code at https://github.com/kaist-dmlab/SELFIE.
Open Datasets	Yes	To validate the superiority of SELFIE, we performed an image classiﬁcation task on four benchmark data sets: CIFAR-10 (10 classes)4 and CIFAR-100 (100 classes)4, classiﬁcation of a subset of 80 million categorical images, with 50, 000 training and 10, 000 testing images; Tiny-Image Net (200 classes)5, classiﬁcation of a subset of Image Net (Krizhevsky et al., 2012), with 100, 000 training and 10, 000 testing images; ANIMAL-10N (10 classes)6, our proprietary real-world noisy data set of human-labeled online images for 10 confusing animals, with 50, 000 training and 5, 000 testing images. Please note that, in ANIMAL10N, noisy labels were injected naturally by human mistakes, where its noise rate was estimated at 8%. It has been released on our site6, and its details can be found in Appendix B (supplementary material).
Dataset Splits	No	The paper states training and testing image counts for each dataset (e.g., '50,000 training and 10,000 testing images' for CIFAR-10/100, and '100,000 training and 10,000 testing images' for Tiny-Image Net), but does not explicitly provide details for a separate validation split.
Hardware Specification	Yes	All the algorithms were implemented using Tensor Flow 1.8.07 and executed using a single NVIDIA Tesla V100 GPU.
Software Dependencies	Yes	All the algorithms were implemented using Tensor Flow 1.8.07
Experiment Setup	Yes	Network and Hyperparameters: For the classiﬁcation task, we trained Dense Net (L=25, k=12) and VGG-19 with a momentum optimizer. Speciﬁcally, we used a momentum of 0.9, a batch size of 128, a dropout of 0.2 (Srivastava et al., 2014), and batch normalization (Ioffe & Szegedy, 2015). For the training schedule, following the experimental setup of Huang et al. (2017), we trained the network for 100 epochs and used an initial learning rate of 0.1, which was divided by 5 at 50% and 75% of the total number of epochs. Regarding the hyperparameters, we ﬁxed restart to 2 (i.e., restarted Algorithm 1 twice after the ﬁrst run) and used the best uncertainty threshold ϵ = 0.05 and history length q = 15, which were obtained from a grid ϵ = {0.05, 0.10, 0.15, 0.20} and q = {10, 15, 20}. (See Section 4.5 for details.) The warm-up threshold γ was set to 25 for the initial learning.