Decoupled Training for Long-Tailed Classification With Stochastic Representations

Authors: Giung Nam, Sunguk Jang, Juho Lee

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on CIFAR10/100-LT, Image Net-LT, and i Naturalist-2018 benchmarks show that our proposed method improves upon previous methods both in terms of prediction accuracy and uncertainty estimation.
Researcher Affiliation Collaboration 1Korea Advanced Institute of Science and Technology (KAIST), 2AITRICS
Pseudocode Yes Algorithm 1 Decoupled training w/ SWA + SRepr (ours).
Open Source Code Yes Code is available at https://github.com/cs-giung/long-tailed-srepr. Our implementations are built on JAX (Bradbury et al., 2018), Flax (Heek et al., 2020), and Optax (Hessel et al., 2020).
Open Datasets Yes Using CIFAR10/100-LT (Cao et al., 2019), Image Net-LT (Liu et al., 2019), and i Naturalist-2018 (Van Horn et al., 2018) benchmarks for long-tailed image classification, we empirically validate that our proposed method improves upon previous approaches both in terms of prediction accuracy and uncertainty estimation.
Dataset Splits Yes Image Net-LT. It consists of 115,846 train examples, 20,000 validation examples and 50,000 test examples from 1,000 classes.
Hardware Specification Yes For Image Net-LT and i Naturalist-2018, we conduct all experiments on 8 TPUv3 cores, supported by TPU Research Cloud.
Software Dependencies No Our implementations are built on JAX (Bradbury et al., 2018), Flax (Heek et al., 2020), and Optax (Hessel et al., 2020).
Experiment Setup Yes Throughout the main experiments on Image Net-LT and i Naturalist-2018, we use an SGD optimizer with batch size 256, Nesterov momentum 0.9, and a single-cycle cosine decaying learning rate starting from the base learning rate of 0.1. Unless specified, the optimization for the representation learning stage terminates after 100 training epochs for Image Net-LT and 200 training epochs for i Naturalist-2018. For the classifier re-training, we introduce an additional 10% training epochs to re-train the classifier. (...) Throughout the paper, we apply λwd = 0.0003 for Image Net LT, λwd = 0.0001 for i Naturalist-2018, and λwd = 0.0005 for CIFAR10/100-LT. (...) Throughout the paper, we use ηSWA = 0.010 for Image Net-LT, ηSWA = 0.005 for i Naturalist-2018, and ηSWA = 0.1 for CIFAR10/100-LT.