Stochastic Feature Averaging for Learning with Long-Tailed Noisy Labels

Authors: Hao-Tian Li, Tong Wei, Hao Yang, Kun Hu, Chong Peng, Li-Bo Sun, Xun-Liang Cai, Min-Ling Zhang

IJCAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results show that SFA can enhance the performance of existing methods on both simulated and real-world datasets.
Researcher Affiliation Collaboration Hao-Tian Li1,2,3 , Tong Wei1,2 , Hao Yang3 , Kun Hu3 , Chong Peng3 , Li-Bo Sun3 , Xun-Liang Cai3 , Min-Ling Zhang1,2 1School of Computer Science and Engineering, Southeast University, Nanjing 210096, China 2Key Lab. of Computer Network and Information Integration (Southeast University), MOE, China 3Meituan, Shanghai, China {liht, weit, zhangml}@seu.edu.cn, {yanghao52, hukun05, pengchong, sunlibo03, caixunliang}@meituan.com
Pseudocode Yes The pseudo-code of our SFA framework is summarized in Algorithm 1. Algorithm 1: The SFA Framework
Open Source Code Yes Our code is available at https://github.com/Hotan Lee/SFA.
Open Datasets Yes We first test our approach on CIFAR-10 and CIFAR-100 datasets by simulating training data with long-tailed class distribution and label noise following prior work [Wei et al., 2021]. Web Vision is a large-scale dataset with real-world noisy labels and long-tailed distributions.
Dataset Splits No The paper mentions training data and test accuracy, and refers to 'Web Vision (Image Net) validation sets', but does not explicitly state the specific percentages or counts for training/test/validation splits for any dataset. It implies the use of standard splits for benchmark datasets but does not detail them.
Hardware Specification Yes The model is trained for 200 epochs with 1 NVIDIA Ge Force RTX 3090. The model is trained for 100 epochs in total with 2 NVIDIA Ge Force RTX 3090.
Software Dependencies No The paper describes various algorithms, models, and optimization methods (e.g., SGD, Mix Match, Balanced Softmax) but does not provide specific version numbers for software dependencies or libraries.
Experiment Setup Yes We use an 18-layer Pre Act Res Net [He et al., 2016] and train it using SGD with a momentum of 0.9, a weight decay of 5 10^-4, a batch size of 128 and an initial learning rate of 0.02. The model is trained for 200 epochs with 1 NVIDIA Ge Force RTX 3090. We perform sample selection after a warm-up period of 30 epochs and anneal the learning rate by a factor of 10 after 150 epochs. For all CIFAR experiments, we choose ρ from {10, 50, 100} and γ from {0.2, 0.5}, and use the same hyperparameters β = 0.99 and S = 5.