Learning from Rich Semantics and Coarse Locations for Long-tailed Object Detection

Authors: Lingchen Meng, Xiyang Dai, Jianwei Yang, Dongdong Chen, Yinpeng Chen, Mengchen Liu, Yi-Ling Chen, Zuxuan Wu, Lu Yuan, Yu-Gang Jiang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments and analysis on the task of long-tailed object detection. We mainly evaluate our method on LVIS [12] val 1.0 for our main experiments and ablations. We also conduct experiments on other datasets of long-tail distribution to further prove the effectiveness. We use DINO [61], a advanced DETR-based detector due to the training efficiency and high performance.
Researcher Affiliation Collaboration 1Shanghai Key Lab of Intell. Info. Processing, School of CS, Fudan University 2Shanghai Collaborative Innovation Center of Intelligent Visual Computing 3Microsoft
Pseudocode No The paper describes the proposed method and its components through text and mathematical equations, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/Meng Lcool/Rich Sem.
Open Datasets Yes We mainly conduct experiments on LVIS [12], which contains 1203 classes with 100K images. Moreover, we experiment on other datasets, e.g. Visual Genome [22] and Open Images [24], please refer to our supplementary material. We mainly use Image Net-21k [5] as additional classification data.
Dataset Splits Yes We mainly evaluate our method on LVIS [12] val 1.0 for our main experiments and ablations. We also conduct experiments on other datasets of long-tail distribution to further prove the effectiveness. We use DINO [61], a advanced DETR-based detector due to the training efficiency and high performance.
Hardware Specification Yes We adopt Py Torch for implementation and use 8 V100 GPUs.
Software Dependencies No The paper mentions 'PyTorch' as the implementation framework but does not provide specific version numbers for it or any other key software libraries or dependencies used in the experiments.
Experiment Setup Yes We set the initial learning rate as 1e-4 and multiply 0.1 at the 11-th, 20-th and 30-th epoch for 1 , 2 and 3 , respectively, and set λsoft = 0.5 for the soft semantics learning loss. Following [63], we use a federated loss [64] and repeat factor sampling [12] for LVIS; we use category aware sampling for Open Images. We randomly resize an input image with its shorter side between 480 and 800, limiting the longer size below 1333.