Noise Attention Learning: Enhancing Noise Robustness by Gradient Scaling

Authors: Yangdi Lu, Yang Bo, Wenbo He

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results show that most of the mislabeled samples yield significantly lower weights than the clean ones. Furthermore, our theoretical analysis shows that the gradients of training samples are dynamically scaled by the attention weights, implicitly preventing memorization of the mislabeled samples. Experimental results on two benchmarks (CIFAR-10 and CIFAR-100) with simulated label noise and three realworld noisy datasets (ANIMAL-10N, Clothing1M and Webvision) demonstrate that our approach outperforms state-of-the-art methods.
Researcher Affiliation Academia Yangdi Lu Department of Computing and Software Mc Master University luy100@mcmaster.ca Yang Bo Department of Computing and Software Mc Master University boy2@mcmaster.ca Wenbo He Department of Computing and Software Mc Master University hew11@mcmaster.ca
Pseudocode Yes Algorithm 1 Noise Attention Learning (NAL) pseudocode
Open Source Code No The paper does not provide concrete access (e.g., a specific repository link or explicit statement of code release) to the source code for the methodology described.
Open Datasets Yes We evaluate our approach on two benchmarks CIFAR-10 and CIFAR-100 [2] with simulated label noise, and three real-world datasets, ANIMAL-10N [15], Clothing1M [16] and Web Vision [3].
Dataset Splits Yes All the compared methods are evaluated on Web Vision and Image Net ILSVRC12 validation sets.
Hardware Specification Yes All experiments are implemented in Py Torch and run in a single NIVDIA A100 GPU.
Software Dependencies No The paper states 'All experiments are implemented in Py Torch' but does not provide specific version numbers for PyTorch or other software dependencies.
Experiment Setup Yes For CIFAR with class-conditional noise, we use a Res Net-34 [39] and train it using SGD with a batch size of 64. For CIFAR-10 and ANIMAL-10N, we set λ = 0.5. For CIFAR-100, we set λ = 10. For Webvision and Clothing1M, we set λ = 50.