Decoupled Contrastive Learning for Long-Tailed Recognition

Authors: Shiyu Xuan, Shiliang Zhang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on different long-tailed classification benchmarks demonstrate the superiority of our method. For instance, it achieves the 57.7% top-1 accuracy on the Image Net-LT dataset. Combined with the ensemble-based method, the performance can be further boosted to 59.7%, which substantially outperforms many recent works.
Researcher Affiliation Academia National Key Laboratory for Multimedia Information Processing School of Computer Science, Peking University, Beijing, China
Pseudocode No The paper does not contain clearly labeled pseudocode or algorithm blocks.
Open Source Code No Our code will be released.
Open Datasets Yes We use three popular datasets to evaluate the longtailed recognition performance. Image Net-LT (Liu et al. 2019) contains 115,846 training images of 1,000 classes sampled from the Image Net1K (Russakovsky et al. 2015)... i Natura List 2018 (Van Horn et al. 2018)... Places-LT (Liu et al. 2019)...
Dataset Splits Yes We use three popular datasets to evaluate the longtailed recognition performance. Image Net-LT (Liu et al. 2019)... i Natura List 2018 (Van Horn et al. 2018)... Places-LT (Liu et al. 2019)... We follow the standard evaluation metrics that evaluate our models on the testing set and report the overall top-1 accuracy across all classes.
Hardware Specification Yes SGD optimizer is used with a learning rate decays by cosine scheduler from 0.1 to 0 with batch size 256 on 2 Nvidia RTX 3090 in 200 epochs.
Software Dependencies No The paper mentions software components like 'Res Net-50', 'Mo Co V2', and 'SGD optimizer', but does not provide specific version numbers for any libraries or frameworks used (e.g., PyTorch version, Python version, CUDA version).
Experiment Setup Yes At the first stage, the basic framework is the same as Mo Co V2 (Chen et al. 2020), the momentum value for the updating of EMA model is 0.999, the temperature τ is 0.07, the size of memory queue M is 65536, and the output dimension of projection head is 128. The data augmentation is the same as Mo Co V2 (Chen et al. 2020). Locations to get the patchbased features are sampled randomly from the global view with the scale of (0.05, 0.6). Image patches cropped from the global view are resized to 64. The number of patch-based feature L per anchor image is 5. SGD optimizer is used with a learning rate decays by cosine scheduler from 0.1 to 0 with batch size 256 on 2 Nvidia RTX 3090 in 200 epochs. For Places-LT, we only fine-tune the last block of the backbone for 30 epochs (Kang et al. 2019). At the second stage, the parameters are the same as (Li et al. 2021). The linear classifier is trained for 40 epochs with CE loss and class-balanced sampling (Kang et al. 2019) with batch size 2048 using SGD optimizer. The learning rate is initialized as 10, 30, 2.5 for Image Net-LT, i Natura List 2018, and Places-LT, respectively, and multiplied by 0.1 at epoch 20 and 30.