Noise Attention Learning: Enhancing Noise Robustness by Gradient Scaling
Authors: Yangdi Lu, Yang Bo, Wenbo He
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results show that most of the mislabeled samples yield significantly lower weights than the clean ones. Furthermore, our theoretical analysis shows that the gradients of training samples are dynamically scaled by the attention weights, implicitly preventing memorization of the mislabeled samples. Experimental results on two benchmarks (CIFAR-10 and CIFAR-100) with simulated label noise and three realworld noisy datasets (ANIMAL-10N, Clothing1M and Webvision) demonstrate that our approach outperforms state-of-the-art methods. |
| Researcher Affiliation | Academia | Yangdi Lu Department of Computing and Software Mc Master University luy100@mcmaster.ca Yang Bo Department of Computing and Software Mc Master University boy2@mcmaster.ca Wenbo He Department of Computing and Software Mc Master University hew11@mcmaster.ca |
| Pseudocode | Yes | Algorithm 1 Noise Attention Learning (NAL) pseudocode |
| Open Source Code | No | The paper does not provide concrete access (e.g., a specific repository link or explicit statement of code release) to the source code for the methodology described. |
| Open Datasets | Yes | We evaluate our approach on two benchmarks CIFAR-10 and CIFAR-100 [2] with simulated label noise, and three real-world datasets, ANIMAL-10N [15], Clothing1M [16] and Web Vision [3]. |
| Dataset Splits | Yes | All the compared methods are evaluated on Web Vision and Image Net ILSVRC12 validation sets. |
| Hardware Specification | Yes | All experiments are implemented in Py Torch and run in a single NIVDIA A100 GPU. |
| Software Dependencies | No | The paper states 'All experiments are implemented in Py Torch' but does not provide specific version numbers for PyTorch or other software dependencies. |
| Experiment Setup | Yes | For CIFAR with class-conditional noise, we use a Res Net-34 [39] and train it using SGD with a batch size of 64. For CIFAR-10 and ANIMAL-10N, we set λ = 0.5. For CIFAR-100, we set λ = 10. For Webvision and Clothing1M, we set λ = 50. |