Decoupled Optimisation for Long-Tailed Visual Recognition

Authors: Cong Cong, Shiyu Xuan, Sidong Liu, Shiliang Zhang, Maurice Pagnucco, Yang Song

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on long-tailed datasets, including CIFAR100, Places-LT, Image Net-LT, and i Natura List 2018, show that our framework achieves competitive performance compared to the state-of-the-art.
Researcher Affiliation Academia 1School of Computer Science and Engineering, University of New South Wales, Sydney, Australia 2National Key Laboratory for Multimedia Information Processing, School of Computer Science, Peking University, China 3 Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to source code for the methodology described in this paper.
Open Datasets Yes Image Net-LT (Liu et al. 2019) is a long-tailed version of Image Net (Deng et al. 2009). i Naturalist2018 (Van Horn et al. 2018) is a large-scale species classification dataset. CIFAR100-LT (Krizhevsky, Sutskever, and Hinton 2012) has 60,000 images, where 50,000 are used for training and 10,000 for validation. Places-LT (Liu et al. 2019) is a long-tailed version of the original Places-2 (Zhou et al. 2017), which contains 184.5K images which come from a total of 365 categories where the class cardinality ranges from 5 to 4,980.
Dataset Splits Yes CIFAR100-LT (Krizhevsky, Sutskever, and Hinton 2012) has 60,000 images, where 50,000 are used for training and 10,000 for validation. This work used a long-tailed version of CIFAR100 where the imbalance ratio (β) is manually selected using β= Nmax Nmin where Nmax and Nmin are the numbers of instances for the most and least frequent classes. For the other datasets that only have train-val sets, the same validation set is used for tuning and benchmarking.
Hardware Specification Yes All reported models are trained using 4 NVIDIA Tesla V100 GPUs.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes For Stage1, we conduct training for 100 epochs and decay the learning rate by a cosine scheduler from 0.02 to 0 for Image Net-LT, i Naturalist2018 and Places-LT, and 0.05 to 0 for CIFAR100-LT. For the remaining two stages, since we only fine-tune part of the model, we only train for 50 epochs and the learning rate is equal to 0.002 for Image Net-LT, i Naturalist2018, and Places-LT, and 0.005 for CIFAR100LT. All pieces of training are conducted with a batch size of 256. In all reported experiments, we use strong augmentations (Cubuk et al. 2020) that have demonstrated effectiveness in previous studies (Cui et al. 2021). The ρm in medium-enhanced sampling is set to 80% for Image Net-LT, i Naturalist2018 and Places-LT, and 70% for CIFAR100-LT. To select γbest, we iterate through ten possible values from 0% to 100%, with a step size of 10%. The γbest for Stage 1 is set to 50% for all used datasets, whereas, for Stage 2, the γbest is set to 80% for Image Net-LT, Places LT, and CIFAR100 (β=100) and 60% for i Naturalist2018 and CIFAR100 (β=50).