reproducibilityindex.ai

Learning from Rich Semantics and Coarse Locations for Long-tailed Object Detection

Authors: Lingchen Meng, Xiyang Dai, Jianwei Yang, Dongdong Chen, Yinpeng Chen, Mengchen Liu, Yi-Ling Chen, Zuxuan Wu, Lu Yuan, Yu-Gang Jiang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments and analysis on the task of long-tailed object detection. We mainly evaluate our method on LVIS [12] val 1.0 for our main experiments and ablations. We also conduct experiments on other datasets of long-tail distribution to further prove the effectiveness. We use DINO [61], a advanced DETR-based detector due to the training efficiency and high performance.
Researcher Affiliation	Collaboration	1Shanghai Key Lab of Intell. Info. Processing, School of CS, Fudan University 2Shanghai Collaborative Innovation Center of Intelligent Visual Computing 3Microsoft
Pseudocode	No	The paper describes the proposed method and its components through text and mathematical equations, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/Meng Lcool/Rich Sem.
Open Datasets	Yes	We mainly conduct experiments on LVIS [12], which contains 1203 classes with 100K images. Moreover, we experiment on other datasets, e.g. Visual Genome [22] and Open Images [24], please refer to our supplementary material. We mainly use Image Net-21k [5] as additional classification data.
Dataset Splits	Yes	We mainly evaluate our method on LVIS [12] val 1.0 for our main experiments and ablations. We also conduct experiments on other datasets of long-tail distribution to further prove the effectiveness. We use DINO [61], a advanced DETR-based detector due to the training efficiency and high performance.
Hardware Specification	Yes	We adopt Py Torch for implementation and use 8 V100 GPUs.
Software Dependencies	No	The paper mentions 'PyTorch' as the implementation framework but does not provide specific version numbers for it or any other key software libraries or dependencies used in the experiments.
Experiment Setup	Yes	We set the initial learning rate as 1e-4 and multiply 0.1 at the 11-th, 20-th and 30-th epoch for 1 , 2 and 3 , respectively, and set λsoft = 0.5 for the soft semantics learning loss. Following [63], we use a federated loss [64] and repeat factor sampling [12] for LVIS; we use category aware sampling for Open Images. We randomly resize an input image with its shorter side between 480 and 800, limiting the longer size below 1333.