Adaptive Logit Adjustment Loss for Long-Tailed Visual Recognition

Authors: Yan Zhao, Weicong Chen, Xu Tan, Kai Huang, Jihong Zhu3472-3480

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results show that our method achieves the state-of-the-art performance on challenging recognition benchmarks, including Image Net-LT, i Naturalist 2018, and Places-LT.
Researcher Affiliation Collaboration Yan Zhao1, Weicong Chen2, Xu Tan3, Kai Huang2, Jihong Zhu1* 1Tsinghua University, 2Bytedance, 3Microsoft zhao-y18@mails.tsinghua.edu.cn, {chenweicong.do, huangkai.honka}@bytedance.com, xuta@microsoft.com, jhzhu@tsinghua.edu.cn
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement or link indicating that the source code for the methodology is openly available.
Open Datasets Yes To evaluate the effectiveness and generality of our method, we conduct a series of experiments on three widely used large-scale long-tailed datasets: Image Net-LT, i Naturalist 2018 and Places-LT. Image Net-LT. The Image Net-LT (Liu et al. 2019) dataset is an artificially sampled subset of Image Net-2012 (Deng et al. 2009), with 115.8K images. i Naturalist 2018. The i Naturalist 2018 (Van Horn et al. 2018) dataset is a real-world imbalanced dataset, with 437.5K images. Places-LT. The Places-LT (Liu et al. 2019) dataset is a long-tailed subset of the dataset Places (Zhou et al. 2017), with 62.5K images.
Dataset Splits Yes All networks are trained on the long-tailed training datasets, and then evaluated on the corresponding balanced validation or test datasets. Top-1 accuracy is used as the evaluation metric, in the form of percentages. In order to better analyze the performance on classes of different data frequency, we report the accuracy on four class subsets according to the number of training instances in each class: Many-shot (>100), Medium-shot (20-100), Few-shot (1-20) and All as in (Liu et al. 2019).
Hardware Specification Yes All networks are trained on 2 Tesla V100 GPUs, 90 epochs for Image Net-LT and i Naturalist 2018, while 30 epochs for Places-LT.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes For Image Net-LT, Res Net-50 and Res Ne Xt-50 (32x4d) (He et al. 2016) are adopted as backbones. The batch size is set as 256 with an initial learning rate of 0.1 and a weight decay of 0.0005. For i Naturalist 2018, Res Net-50 is used as the backbone. And the batch size is set as 512 with an initial learning rate of 0.2 and a weight decay of 0.0002. For Places-LT, we utilize Res Net-152 as the backbone. The scale factor s in Equation (8) is set to 30 by default. And we use the same training strategies as LDAM (Cao et al. 2019) .