Label Distributionally Robust Losses for Multi-class Classification: Consistency, Robustness and Adaptivity
Authors: Dixian Zhu, Yiming Ying, Tianbao Yang
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our contributions include: (3) we demonstrate stable and competitive performance for the proposed adaptive LDR loss on 7 benchmark datasets under 6 noisy label and 1 clean settings against 13 loss functions, and on one real-world noisy dataset. |
| Researcher Affiliation | Academia | 1The University of Iowa, Iowa City, USA 2University at Albany, Albany, USA 3Texas A&M University, College Station, USA. |
| Pseudocode | Yes | Algorithm 1 Stochastic Optimization for ALDR-KL loss |
| Open Source Code | Yes | The method is open-sourced at https://github.com/ Optimization-AI/ICML2023_LDR. |
| Open Datasets | Yes | We conduct experiments on 7 benchmark datasets, namely ALOI, News20, Letter, Vowel (Fan & Lin), Kuzushiji-49, CIFAR-100 and Tiny-Image Net (Clanuwat et al., 2018; Deng et al., 2009). The statistics of the datasets are summarized in Table 5 in the Appendix. |
| Dataset Splits | Yes | For all the experiments unless specified otherwise, we manually add label noises to the training and validation data, but keep the testing data clean. We apply 5-fold-cross-validation to conduct the training and evaluation, and report the mean and standard deviation for the testing top-k accuracy, where k {1, 2, 3, 4, 5}. |
| Hardware Specification | Yes | Each entry stands for mean and standard deviation for 100 consecutive epochs running on a x86_64 GNU/Linux cluster with NVIDIA Ge Force GTX 1080 Ti GPU card. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies or libraries used for replication. |
| Experiment Setup | Yes | We fix the weight decay as 5e-3, batch size as 64, and total running epochs as 100 for all the datasets except Kuzushiji, CIFAR100 and Tiny-Image Net (we run 30 epochs for them because the data sizes are large). We utilize the momentum optimizer with the initial learning rate tuned in {1e-1, 1e-2, 1e-3} for all experiments. |