Meta Label Correction for Noisy Label Learning

Authors: Guoqing Zheng, Ahmed Hassan Awadallah, Susan Dumais11053-11061

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We run extensive experiments with different label noise levels and types on both image recognition and text classification tasks.
Researcher Affiliation Industry Guoqing Zheng, Ahmed Hassan Awadallah, Susan Dumais Microsoft Research {zheng, hassanam, sdumais}@microsoft.com
Pseudocode Yes Algorithm 1 outlines an iterative procedure to solve the above proxy problem with k-step look ahead SGD for the main model.
Open Source Code Yes Code for MLC is available at https://aka.ms/MLC
Open Datasets Yes We evaluate our method on 3 image recognition datasets, CIFAR-10, CIFAR-100 (Krizhevsky 2009) and Clothing1M (Xiao et al. 2015) and 4 large-scale multi-class text classification benchmark datasets, that are widely used by text classification research (Zhang, Zhao, and Le Cun 2015; Xie et al. 2019; Dai et al. 2019; Yang et al. 2016; Conneau et al. 2016), AG news, Amazon reviews, Yelp reviews and Yahoo answers.
Dataset Splits Yes To ensure a fair and consistent evaluation, we use only 1000 images as clean set for both CIFAR-10 and CIFAR-100, and only 100 instances per class for the four large scale text classification data sets. The noisy sets are generated by corrupting the labels of all the remaining data points based on the following two setting: Uniform label noise (UNIF).
Hardware Specification No The paper mentions using Res Net 32, Res Net 50, and pre-trained BERT-base as classifiers, but it does not specify any hardware details like GPU models, CPU types, or memory used for running the experiments.
Software Dependencies No The paper states: "We implement all models and experiments in Py Torch." However, it does not provide a specific version number for PyTorch or any other software dependencies.
Experiment Setup Yes We use the same LCN architecture for MLC across all settings as follows (Figure 2(a)): An embedding layer of size (C, 128) to embed the input noisy labels, followed by a three-layer feed-forward network with dimensions of (128+xdim, hdim), (hdim, hdim), (hdim,C) respectively. tanh is used as the nonlinear activation function in-between them and lastly a Softmax layer to output a categorical distribution as the corrected labels where C is the number of classes, xdim is the feature dimension of input x from the last layer from the main classifier, i.e., 64 from Res Net32 for CIFAR-10 and CIFAR-100, 2048 from Res Net 50 for Clothing1M and 768 from BERT-base for text datasets and hdim is the hidden dimension for the LCN (set to 768 for text datasets and 64 otherwise).