Long-tailed Object Detection Pretraining: Dynamic Rebalancing Contrastive Learning with Dual Reconstruction
Authors: Chen-Long Duan, Yong Li, Xiu-Shen Wei, Lin Zhao
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on COCO and LVIS v1.0 datasets demonstrate the effectiveness of our method, particularly in improving the m AP/AP scores for tail classes. To evaluate the effectiveness of our method, we conduct extensive experiments on two benchmark datasets, i.e., COCO [35] and LVIS v1.0 [13]. Experiments on these datasets from both quantitative and qualitative perspectives validate the effectiveness of our proposed method. |
| Researcher Affiliation | Academia | 1Nanjing University of Science and Technology 2School of Computer Science and Engineering, and Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications, Southeast University |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The authors state: 'We will continue to conduct further research based on this work. However, we can consider releasing the main checkpoints for public use.' This indicates a future possibility rather than immediate open-source availability. |
| Open Datasets | Yes | We conduct experiments on two representative datasets: COCO [35] and LVIS v1.0 [13]. Microsoft COCO: Common objects in context. LVIS: A dataset for large vocabulary instance segmentation. |
| Dataset Splits | Yes | The COCO dataset...comprising 80 classes with a relatively balanced distribution, including 118k training images and 5k validation images. LVIS features 1,203 classes with a highly imbalanced distribution, containing 100k training images and 19.8k validation images. |
| Hardware Specification | Yes | We pre-train the models on 8 RTX3090 GPUs with a batch size of 16. The models are trained with a total batch size of 16 on 8 GPUs (RTX3090 with 24 GB VRAM). |
| Software Dependencies | No | All models are implemented using the MMDetection toolbox [5]. We employ MMDetection [5] as our detection framework to conduct our experiment. Py Torch: An imperative style, high-performance deep learning library. The paper mentions software tools like MMDetection and PyTorch but does not specify their version numbers, which are crucial for reproducibility. |
| Experiment Setup | Yes | Unless otherwise specified, pre-training follows the 1 schedule (12 epochs), starting with an initial learning rate of 0.02, which is reduced by a factor of 10 after the 8th and 11th epochs. For 2 schedule, models are trained with 24 epochs, and the learning rate decays at the end of epoch 16 and 22. In our experiments, the hyper-parameters are set as follows: αc is set to 0.1, βc is set to 0.05, αr is set to 0.1. We trained the models using SGD with 0.9 momentum. The batch size and learning rate are set as 16 and 0.02, and the data augmentation strictly follows previous long-tailed detection methods [24, 47]. |