Exploring Weight Balancing on Long-Tailed Recognition Problem
Authors: Naoya Hasegawa, Issei Sato
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this study, we analyze weight balancing by focusing on neural collapse and the cone effect at each training stage and found that it can be decomposed into an increase in Fisher s discriminant ratio of the feature extractor caused by weight decay and cross entropy loss and implicit logit adjustment caused by weight decay and class-balanced loss. Our analysis enables the training method to be further simplified by reducing the number of training stages to one while increasing accuracy. Code is available at https://github.com/HN410/Exploring Weight-Balancing-on-Long-Tailed-Recognition-Problem. |
| Researcher Affiliation | Academia | Naoya Hasegawa & Issei Sato The University of Tokyo {hasegawa-naoya410, sato}@g.ecc.u-tokyo.ac.jp |
| Pseudocode | No | The paper describes methods in textual form but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/HN410/Exploring Weight-Balancing-on-Long-Tailed-Recognition-Problem. |
| Open Datasets | Yes | We used CIFAR10, CIFAR100 (Krizhevsky, 2009), mini-Image Net (Vinyals et al., 2016), and Image Net (Deng et al., 2009) as the datasets and followed Cui et al. (2019), Vigneswaran et al. (2021), and Liu et al. (2019) to create long-tailed datasets, CIFAR10-LT, CIFAR100-LT, mini-Image Net-LT, and Image Net-LT. |
| Dataset Splits | Yes | We created validation datasets from the portions of the training datasets because CIFAR10 and CIFAR100 have only training and test data. As with Liu et al. (2019), only 20 samples per class were taken from the training dataset to compose the validation dataset, and the training dataset was composed of the rest of the data. |
| Hardware Specification | Yes | We conducted experiments on an NVIDIA A100. |
| Software Dependencies | No | The paper mentions optimizers (SGD, Adam W) and models (Res Ne Xt50, Res Net34) but does not specify version numbers for programming languages or libraries like PyTorch, TensorFlow, or scikit-learn. |
| Experiment Setup | Yes | Unless otherwise noted, we used the following values for hyperparameters for the Res Net. The optimizer was SGD with momentum = 0.9 and cosine learning rate scheduler (Loshchilov & Hutter, 2017) to gradually decrease the learning rate from 0.01 to 0. The batch size was 64, and the number of epochs was 320 for the first stage and 10 for the second stage. |