Can Cross Entropy Loss Be Robust to Label Noise?

Authors: Lei Feng, Senlin Shu, Zhuoyi Lin, Fengmao Lv, Li Li, Bo An

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results on benchmark datasets demonstrate that our proposed approach significantly outperforms the state-of-the-art counterparts.
Researcher Affiliation Academia Lei Feng1 , Senlin Shu2 , Zhuoyi Lin1 , Fengmao Lv3 , Li Li2 , Bo An1 1School of Computer Science and Engineering, Nanyang Technological University, Singapore 2College of Computer and Information Science, Southwest University, Chongqing, China 3Center of Statistical Research, Southwestern University of Finance and Economics, China
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statements or links regarding the availability of open-source code for the described methodology.
Open Datasets Yes Our experiments are conducted on MNIST [Le Cun et al., 1998], Fashion-MNIST (Fashion in short) [Xiao et al., 2017], Kuzushiji-MNIST (Kuzushiji in short) [Clanuwat et al., 2018], CIFAR-10 [Krizhevsky et al., 2009] and CIFAR-100 [Krizhevsky et al., 2009]
Dataset Splits No The paper mentions training and testing sets, but does not explicitly describe a validation set, its size, or the splitting methodology for it.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU models, CPU specifications).
Software Dependencies No The paper mentions using the Adam optimizer and specific deep learning models (Le Net-5, Res Net-34), but does not list any specific software libraries or their version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes For all the methods, learning rate is selected from {10^-2, 10^-3, 10^-4, 10^-5}. ...all networks are trained using the Adam optimizer [Kingma and Ba, 2014] with the number of epochs set to 200 and the batch size set to 256. ...On the three datasets, networks are trained with weight decay of 10^-4. ...On the two datasets, networks are trained with weight decay of 0. ...GCE [Zhang and Sabuncu, 2018]: ...q is set to 0.7... PHuber-CE: ...τ is selected from {2, 10}. ...TCE: ...t is selected from {2, . . . , 6}.