Active Negative Loss Functions for Learning with Noisy Labels
Authors: Xichen Ye, Xiaoqiang Li, songmin dai, Tong Liu, Yan Sun, Weiqin Tong
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on benchmark and real-world datasets demonstrate that the new set of loss functions created by our ANL framework can outperform state-of-the-art methods. |
| Researcher Affiliation | Academia | Xichen Ye Shanghai University Shanghai, China yexichen0930@shu.edu.cn Xiaoqiang Li Shanghai University Shanghai, China xqli@shu.edu.cn Songmin Dai Shanghai University Shanghai, China laodar@shu.edu.cn Tong Liu Shanghai University Shanghai, China tong_liu@shu.edu.cn Yan Sun Shanghai University Shanghai, China yansun@shu.edu.cn Weiqin Tong Shanghai University Shanghai, China wqtong@shu.edu.cn |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. Methods are described in prose and mathematical formulations. |
| Open Source Code | Yes | The code is available at https://github.com/Virusdoll/Active-Negative-Loss. |
| Open Datasets | Yes | In this section, we empirically investigate our proposed ANL functions on benchmark datasets, including MNIST [12], CIFAR-10/CIFAR-100 [13] and a real-world noisy dataset Web Vision [14]. |
| Dataset Splits | Yes | Specifically, we use 10% of the original training set as the validation set, and generate 0.8 symmetric noise on the remaining 90% of the original training set as the training set by the standard noise generation approach. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, memory amounts) used for running its experiments. It only mentions general aspects like 'training deep neural networks'. |
| Software Dependencies | No | The paper mentions optimizers like SGD and Adam but does not provide specific version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used for implementation. |
| Experiment Setup | Yes | For MNIST, CIFAR-10, and CIFAR-100, the networks are trained for 50, 120, and 200 epochs, respectively. For all the training, we use SGD optimizer with momentum 0.9 and cosine learning rate annealing. Weight decay is set to 1 × 10−3, 1 × 10−4, and 1 × 10−5 for MNIST, CIFAR-10, and CIFAR-100, respectively. (...) The initial learning rate is set to 0.01 for MNIST/CIFAR-10 and 0.1 for CIFAR-100. Batch size is set to 128. For all settings, we clip the gradient norm to 5.0. Typical data augmentations including random width/height shift and horizontal flip are applied. |