Learning from Rich Semantics and Coarse Locations for Long-tailed Object Detection
Authors: Lingchen Meng, Xiyang Dai, Jianwei Yang, Dongdong Chen, Yinpeng Chen, Mengchen Liu, Yi-Ling Chen, Zuxuan Wu, Lu Yuan, Yu-Gang Jiang
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments and analysis on the task of long-tailed object detection. We mainly evaluate our method on LVIS [12] val 1.0 for our main experiments and ablations. We also conduct experiments on other datasets of long-tail distribution to further prove the effectiveness. We use DINO [61], a advanced DETR-based detector due to the training efficiency and high performance. |
| Researcher Affiliation | Collaboration | 1Shanghai Key Lab of Intell. Info. Processing, School of CS, Fudan University 2Shanghai Collaborative Innovation Center of Intelligent Visual Computing 3Microsoft |
| Pseudocode | No | The paper describes the proposed method and its components through text and mathematical equations, but it does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/Meng Lcool/Rich Sem. |
| Open Datasets | Yes | We mainly conduct experiments on LVIS [12], which contains 1203 classes with 100K images. Moreover, we experiment on other datasets, e.g. Visual Genome [22] and Open Images [24], please refer to our supplementary material. We mainly use Image Net-21k [5] as additional classification data. |
| Dataset Splits | Yes | We mainly evaluate our method on LVIS [12] val 1.0 for our main experiments and ablations. We also conduct experiments on other datasets of long-tail distribution to further prove the effectiveness. We use DINO [61], a advanced DETR-based detector due to the training efficiency and high performance. |
| Hardware Specification | Yes | We adopt Py Torch for implementation and use 8 V100 GPUs. |
| Software Dependencies | No | The paper mentions 'PyTorch' as the implementation framework but does not provide specific version numbers for it or any other key software libraries or dependencies used in the experiments. |
| Experiment Setup | Yes | We set the initial learning rate as 1e-4 and multiply 0.1 at the 11-th, 20-th and 30-th epoch for 1 , 2 and 3 , respectively, and set λsoft = 0.5 for the soft semantics learning loss. Following [63], we use a federated loss [64] and repeat factor sampling [12] for LVIS; we use category aware sampling for Open Images. We randomly resize an input image with its shorter side between 480 and 800, limiting the longer size below 1333. |