Dimensionality-Driven Learning with Noisy Labels

Authors: Xingjun Ma, Yisen Wang, Michael E. Houle, Shuo Zhou, Sarah Erfani, Shutao Xia, Sudanthi Wijewickrema, James Bailey

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically demonstrate that our approach is highly tolerant to significant proportions of noisy labels, and can effectively learn low-dimensional local subspaces that capture the data distribution. We empirically demonstrate on MNIST, SVHN, CIFAR-10 and CIFAR-100 datasets that our Dimensionality-Driven Learning strategy can effectively learn (1) low-dimensional representation subspaces that capture the underlying data distribution, (2) simpler hypotheses, and (3) high-quality deep representations. We evaluate our proposed D2L learning strategy, comparing the performance of our model with state-of-the-art baselines for noisy label learning. We report the mean test accuracy and standard deviation over 5 repetitions of the experiments in Table 1.
Researcher Affiliation Academia 1The University of Melbourne, Melbourne, Australia 2Tsinghua University, Beijing, China 3National Institute of Informatics, Tokyo, Japan.
Pseudocode Yes Algorithm 1 Dimensionality-Driven Learning (D2L)
Open Source Code Yes The D2L code is available at https://github.com/xingjunm/ dimensionality-driven-learning.
Open Datasets Yes MNIST (an image data set with 10 categories of handwritten digits (Le Cun et al., 1998)); CIFAR-10 (a natural image data set with 10 categories (Krizhevsky & Hinton, 2009)); SVHN (Netzer et al., 2011); CIFAR-100 (Krizhevsky & Hinton, 2009).
Dataset Splits No The paper does not explicitly state any validation dataset splits or sample counts for validation sets. It mentions total epochs and learning rate schedules but no specific validation strategy or data partitioning for validation.
Hardware Specification No The paper does not provide specific details about the hardware used for experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions training networks using 'SGD' but does not specify any software versions for libraries (e.g., PyTorch, TensorFlow) or operating systems used for the experiments.
Experiment Setup Yes All networks were trained using SGD with momentum 0.9, weight decay 10-4 and an initial learning rate of 0.1. The learning rate was divided by 10 after epochs 40 and 80 (T = 120 epochs in total). Simple data augmentations (width/height shift and horizontal flip) were applied. For our proposed D2L, we set k = 20 for LID estimation, and used the average LID score over m = 10 random batches of training samples as the overall dimensionality of the representation subspaces. To identify the turning point between the two stages of learning, we employ an epoch window of size w [1, T 1]...