Stochastic Feature Averaging for Learning with Long-Tailed Noisy Labels
Authors: Hao-Tian Li, Tong Wei, Hao Yang, Kun Hu, Chong Peng, Li-Bo Sun, Xun-Liang Cai, Min-Ling Zhang
IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results show that SFA can enhance the performance of existing methods on both simulated and real-world datasets. |
| Researcher Affiliation | Collaboration | Hao-Tian Li1,2,3 , Tong Wei1,2 , Hao Yang3 , Kun Hu3 , Chong Peng3 , Li-Bo Sun3 , Xun-Liang Cai3 , Min-Ling Zhang1,2 1School of Computer Science and Engineering, Southeast University, Nanjing 210096, China 2Key Lab. of Computer Network and Information Integration (Southeast University), MOE, China 3Meituan, Shanghai, China {liht, weit, zhangml}@seu.edu.cn, {yanghao52, hukun05, pengchong, sunlibo03, caixunliang}@meituan.com |
| Pseudocode | Yes | The pseudo-code of our SFA framework is summarized in Algorithm 1. Algorithm 1: The SFA Framework |
| Open Source Code | Yes | Our code is available at https://github.com/Hotan Lee/SFA. |
| Open Datasets | Yes | We first test our approach on CIFAR-10 and CIFAR-100 datasets by simulating training data with long-tailed class distribution and label noise following prior work [Wei et al., 2021]. Web Vision is a large-scale dataset with real-world noisy labels and long-tailed distributions. |
| Dataset Splits | No | The paper mentions training data and test accuracy, and refers to 'Web Vision (Image Net) validation sets', but does not explicitly state the specific percentages or counts for training/test/validation splits for any dataset. It implies the use of standard splits for benchmark datasets but does not detail them. |
| Hardware Specification | Yes | The model is trained for 200 epochs with 1 NVIDIA Ge Force RTX 3090. The model is trained for 100 epochs in total with 2 NVIDIA Ge Force RTX 3090. |
| Software Dependencies | No | The paper describes various algorithms, models, and optimization methods (e.g., SGD, Mix Match, Balanced Softmax) but does not provide specific version numbers for software dependencies or libraries. |
| Experiment Setup | Yes | We use an 18-layer Pre Act Res Net [He et al., 2016] and train it using SGD with a momentum of 0.9, a weight decay of 5 10^-4, a batch size of 128 and an initial learning rate of 0.02. The model is trained for 200 epochs with 1 NVIDIA Ge Force RTX 3090. We perform sample selection after a warm-up period of 30 epochs and anneal the learning rate by a factor of 10 after 150 epochs. For all CIFAR experiments, we choose ρ from {10, 50, 100} and γ from {0.2, 0.5}, and use the same hyperparameters β = 0.99 and S = 5. |