How Re-sampling Helps for Long-Tail Learning?
Authors: Jiang-Xin Shi, Tong Wei, Yuke Xiang, Yu-Feng Li
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our research shows that re-sampling can considerably improve generalization when the training images do not contain semantically irrelevant contexts. In other scenarios, however, it can learn unexpected spurious correlations between irrelevant contexts and target labels. We design experiments on two homogeneous datasets, one containing irrelevant context and the other not, to confirm our findings. To prevent the learning of spurious correlations, we propose a new context shift augmentation module that generates diverse training images for the tail class by maintaining a context bank extracted from the head-class images. Experiments demonstrate that our proposed module can boost the generalization and outperform other approaches, including class-balanced re-sampling, decoupled classifier re-training, and data augmentation methods. |
| Researcher Affiliation | Collaboration | Jiang-Xin Shi1 Tong Wei2 Yuke Xiang3 Yu-Feng Li1 1National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China 2School of Computer Science and Engineering, Southeast University, Nanjing, China 3Consumer BG, Huawei Technologies, Shenzhen, China |
| Pseudocode | Yes | A Training Procedure Algorithm 1 Training procedure of context-shift augmentation |
| Open Source Code | Yes | The source code is available at https://www.lamda.nju.edu.cn/ code_CSA.ashx. ... The source code of our method is available at https://www.lamda.nju.edu.cn/code_CSA.ashx or https://github.com/shijxcs/CSA. |
| Open Datasets | Yes | To better explore the effect of the re-sampling strategy, we conduct experiments on multiple long-tail datasets, including MNIST-LT, Fashion-LT, CIFAR100-LT [15], and Image Net-LT [5]. ... CIFAR10-LT and CIFAR100-LT are the long-tail versions of CIFAR datasets by sampling from the raw dataset with an imbalance ratio ρ. Following previous works [15, 14], we conduct experiments with ρ {100, 50, 10}. Image Net-LT is a long-tail version of the Image Net [3], which contains 1000 classes, each with a number of samples ranging from 5 to 1280. |
| Dataset Splits | Yes | For MNIST-LT and Fashion-LT, we use Le Net [21] as the backbone network and add a linear embedding layer before the fully connected layer to project the representation into 2-dimensional space for better presentation. We use standard SGD with a mini-batch size of 128, an initial learning rate of 0.1 and a cosine annealing schedule to train the model for 8 epochs. When applying c RT, we retrain the last fully connected layer for 4 epochs by fixing the other layers. For CIFAR100-LT and Image Net-LT, more details are in Section 3.3. ... For CIFAR10-LT and CIFAR100-LT, we use Res Net-32 as the backbone network and train it using standard SGD with a momentum of 0.9, a weight decay of 2 10 4, a batch size of 128. The model is trained for 200 epochs. The initial learning rate is set to 0.2 and is annealed by a factor of 10 at 160 and 180 epochs. |
| Hardware Specification | Yes | We train each model with 1 NVIDIA Ge Force RTX 3090. ... We train each model with 2 NVIDIA Tesla V100 GPUs. |
| Software Dependencies | No | The paper mentions using 'standard SGD' and 'Grad-CAM' but does not specify software versions for libraries like PyTorch, TensorFlow, or specific Python versions, which are crucial for replication. |
| Experiment Setup | Yes | We use standard SGD with a mini-batch size of 128, an initial learning rate of 0.1 and a cosine annealing schedule to train the model for 8 epochs. When applying c RT, we retrain the last fully connected layer for 4 epochs by fixing the other layers. ... For CIFAR10-LT and CIFAR100-LT, we use Res Net-32 as the backbone network and train it using standard SGD with a momentum of 0.9, a weight decay of 2 10 4, a batch size of 128. The model is trained for 200 epochs. The initial learning rate is set to 0.2 and is annealed by a factor of 10 at 160 and 180 epochs. ... For Image Net-LT, we implement the proposed method on Res Net-10 and Res Net-50. We use standard SGD with a momentum of 0.9, a weight decay of 5 10 4, and a batch size of 256 to train the whole model for a total of 90 epochs. We use the cosine learning rate decay with an initial learning rate of 0.2. |