Deep k-NN for Noisy Labels

Authors: Dara Bahri, Heinrich Jiang, Maya Gupta

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we provide an empirical study showing that a simple k-nearest neighbor-based filtering approach on the logit layer of a preliminary model can remove mislabeled training data and produce more accurate models than many recently proposed methods. Experimentally, we show that deep k-NN works as well or better than state-of-art methods for handling noisy labels, and it is robust to the choice of k.
Researcher Affiliation Industry Dara Bahri 1 Heinrich Jiang 1 Maya Gupta 1 1Google Research. Correspondence to: Dara Bahri <dbahri@google.com>, Heinrich Jiang <heinrichj@google.com>.
Pseudocode Yes Algorithm 1 Deep k-NN Filtering
Open Source Code No The paper does not provide an explicit statement about releasing source code for its own methodology or a link to a code repository. It mentions "github.com/google/bi-tempered-loss" in reference to a comparison method, not their own.
Open Datasets Yes We show the results for one of the UCI datasets and Fashion MNIST in Figure 2. Due to space, results for MNIST and the remaining UCI datasets are in the Appendix. For CIFAR10/100 we use Res Net-20... We show results for CIFAR10 in Figure 3... We show the results in Figure 3. We train Res Net-20 from scratch on the GPUs for 100 epochs.
Dataset Splits Yes We randomly partition each dataset s train set into Dnoisy and Dclean, and we present results for 95/5, 90/10, and 80/20 splits. To choose if the preliminary model used to compute the k-NN should be trained on Dclean Dnoisy or only on Dclean, we split Dclean 70/30 into two sets: Dclean Train and Dclean Val
Hardware Specification Yes For CIFAR10/100 we use Res Net-20, which we train from scratch on single NVIDIA P100 GPUs.
Software Dependencies Yes We implement all methods using Tensorflow 2.0 and Scikit Learn.
Experiment Setup Yes We use the Adam optimizer (Kingma & Ba, 2014) with default learning rate 0.001 and a batch size of 128 across all experiments. We use k = 500 for all datasets except the UCI where we use k = 50... For UCI, we use a fully-connected neural network with a single hidden layer of dimension 100 with Re LU activations and we train for 100 epochs. For both MNIST datasets, we use a two hidden-layer fully-connected neural network where each layer has 256 hidden units with Re LU activations. We train the model for 20 epochs. For CIFAR10 for 100 epochs and CIFAR100 for 150 epochs.