Active Negative Loss Functions for Learning with Noisy Labels

Authors: Xichen Ye, Xiaoqiang Li, songmin dai, Tong Liu, Yan Sun, Weiqin Tong

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on benchmark and real-world datasets demonstrate that the new set of loss functions created by our ANL framework can outperform state-of-the-art methods.
Researcher Affiliation Academia Xichen Ye Shanghai University Shanghai, China yexichen0930@shu.edu.cn Xiaoqiang Li Shanghai University Shanghai, China xqli@shu.edu.cn Songmin Dai Shanghai University Shanghai, China laodar@shu.edu.cn Tong Liu Shanghai University Shanghai, China tong_liu@shu.edu.cn Yan Sun Shanghai University Shanghai, China yansun@shu.edu.cn Weiqin Tong Shanghai University Shanghai, China wqtong@shu.edu.cn
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks. Methods are described in prose and mathematical formulations.
Open Source Code Yes The code is available at https://github.com/Virusdoll/Active-Negative-Loss.
Open Datasets Yes In this section, we empirically investigate our proposed ANL functions on benchmark datasets, including MNIST [12], CIFAR-10/CIFAR-100 [13] and a real-world noisy dataset Web Vision [14].
Dataset Splits Yes Specifically, we use 10% of the original training set as the validation set, and generate 0.8 symmetric noise on the remaining 90% of the original training set as the training set by the standard noise generation approach.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, memory amounts) used for running its experiments. It only mentions general aspects like 'training deep neural networks'.
Software Dependencies No The paper mentions optimizers like SGD and Adam but does not provide specific version numbers for any software libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used for implementation.
Experiment Setup Yes For MNIST, CIFAR-10, and CIFAR-100, the networks are trained for 50, 120, and 200 epochs, respectively. For all the training, we use SGD optimizer with momentum 0.9 and cosine learning rate annealing. Weight decay is set to 1 × 10−3, 1 × 10−4, and 1 × 10−5 for MNIST, CIFAR-10, and CIFAR-100, respectively. (...) The initial learning rate is set to 0.01 for MNIST/CIFAR-10 and 0.1 for CIFAR-100. Batch size is set to 128. For all settings, we clip the gradient norm to 5.0. Typical data augmentations including random width/height shift and horizontal flip are applied.