reproducibilityindex.ai

Exploring Weight Balancing on Long-Tailed Recognition Problem

Authors: Naoya Hasegawa, Issei Sato

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this study, we analyze weight balancing by focusing on neural collapse and the cone effect at each training stage and found that it can be decomposed into an increase in Fisher s discriminant ratio of the feature extractor caused by weight decay and cross entropy loss and implicit logit adjustment caused by weight decay and class-balanced loss. Our analysis enables the training method to be further simplified by reducing the number of training stages to one while increasing accuracy. Code is available at https://github.com/HN410/Exploring Weight-Balancing-on-Long-Tailed-Recognition-Problem.
Researcher Affiliation	Academia	Naoya Hasegawa & Issei Sato The University of Tokyo {hasegawa-naoya410, sato}@g.ecc.u-tokyo.ac.jp
Pseudocode	No	The paper describes methods in textual form but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/HN410/Exploring Weight-Balancing-on-Long-Tailed-Recognition-Problem.
Open Datasets	Yes	We used CIFAR10, CIFAR100 (Krizhevsky, 2009), mini-Image Net (Vinyals et al., 2016), and Image Net (Deng et al., 2009) as the datasets and followed Cui et al. (2019), Vigneswaran et al. (2021), and Liu et al. (2019) to create long-tailed datasets, CIFAR10-LT, CIFAR100-LT, mini-Image Net-LT, and Image Net-LT.
Dataset Splits	Yes	We created validation datasets from the portions of the training datasets because CIFAR10 and CIFAR100 have only training and test data. As with Liu et al. (2019), only 20 samples per class were taken from the training dataset to compose the validation dataset, and the training dataset was composed of the rest of the data.
Hardware Specification	Yes	We conducted experiments on an NVIDIA A100.
Software Dependencies	No	The paper mentions optimizers (SGD, Adam W) and models (Res Ne Xt50, Res Net34) but does not specify version numbers for programming languages or libraries like PyTorch, TensorFlow, or scikit-learn.
Experiment Setup	Yes	Unless otherwise noted, we used the following values for hyperparameters for the Res Net. The optimizer was SGD with momentum = 0.9 and cosine learning rate scheduler (Loshchilov & Hutter, 2017) to gradually decrease the learning rate from 0.01 to 0. The batch size was 64, and the number of epochs was 320 for the first stage and 10 for the second stage.