Decoupled Contrastive Learning for Long-Tailed Recognition
Authors: Shiyu Xuan, Shiliang Zhang
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on different long-tailed classification benchmarks demonstrate the superiority of our method. For instance, it achieves the 57.7% top-1 accuracy on the Image Net-LT dataset. Combined with the ensemble-based method, the performance can be further boosted to 59.7%, which substantially outperforms many recent works. |
| Researcher Affiliation | Academia | National Key Laboratory for Multimedia Information Processing School of Computer Science, Peking University, Beijing, China |
| Pseudocode | No | The paper does not contain clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | Our code will be released. |
| Open Datasets | Yes | We use three popular datasets to evaluate the longtailed recognition performance. Image Net-LT (Liu et al. 2019) contains 115,846 training images of 1,000 classes sampled from the Image Net1K (Russakovsky et al. 2015)... i Natura List 2018 (Van Horn et al. 2018)... Places-LT (Liu et al. 2019)... |
| Dataset Splits | Yes | We use three popular datasets to evaluate the longtailed recognition performance. Image Net-LT (Liu et al. 2019)... i Natura List 2018 (Van Horn et al. 2018)... Places-LT (Liu et al. 2019)... We follow the standard evaluation metrics that evaluate our models on the testing set and report the overall top-1 accuracy across all classes. |
| Hardware Specification | Yes | SGD optimizer is used with a learning rate decays by cosine scheduler from 0.1 to 0 with batch size 256 on 2 Nvidia RTX 3090 in 200 epochs. |
| Software Dependencies | No | The paper mentions software components like 'Res Net-50', 'Mo Co V2', and 'SGD optimizer', but does not provide specific version numbers for any libraries or frameworks used (e.g., PyTorch version, Python version, CUDA version). |
| Experiment Setup | Yes | At the first stage, the basic framework is the same as Mo Co V2 (Chen et al. 2020), the momentum value for the updating of EMA model is 0.999, the temperature τ is 0.07, the size of memory queue M is 65536, and the output dimension of projection head is 128. The data augmentation is the same as Mo Co V2 (Chen et al. 2020). Locations to get the patchbased features are sampled randomly from the global view with the scale of (0.05, 0.6). Image patches cropped from the global view are resized to 64. The number of patch-based feature L per anchor image is 5. SGD optimizer is used with a learning rate decays by cosine scheduler from 0.1 to 0 with batch size 256 on 2 Nvidia RTX 3090 in 200 epochs. For Places-LT, we only fine-tune the last block of the backbone for 30 epochs (Kang et al. 2019). At the second stage, the parameters are the same as (Li et al. 2021). The linear classifier is trained for 40 epochs with CE loss and class-balanced sampling (Kang et al. 2019) with batch size 2048 using SGD optimizer. The learning rate is initialized as 10, 30, 2.5 for Image Net-LT, i Natura List 2018, and Places-LT, respectively, and multiplied by 0.1 at epoch 20 and 30. |