End-to-End Multi-Object Detection with a Regularized Mixture Model

Authors: Jaeyoung Yoo, Hojun Lee, Seunghyeon Seo, Inseop Chung, Nojun Kwak

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate D-RMM on MS COCO 2017 (Lin et al., 2014). Table 1 presents a comparison between Sparse R-CNN and D-RMM with different backbone networks on COCO validation set.
Researcher Affiliation Collaboration 1NAVER WEBTOON AI 2Department of Intelligence and Information Science, Seoul National University 3Interdisciplinary Program in Artificial Intelligence, Seoul National University.
Pseudocode No The paper describes the D-RMM framework and architecture using text and figures, but no explicit pseudocode or algorithm blocks are presented.
Open Source Code Yes Code is available at https://github.com/lhj815/D-RMM.
Open Datasets Yes Dataset. We evaluate D-RMM on MS COCO 2017 (Lin et al., 2014).
Dataset Splits Yes Following the common practice, we split the dataset into 118K images for the training set, 5K for the validation set, and 20K for the test-dev set.
Hardware Specification Yes FPS is measured as a network inference time excluding data loading on a single NVIDIA TITAN RTX using MMDet (Chen et al., 2019) with batch size 1.
Software Dependencies No The paper mentions using MMDet and Adam W optimizer, but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes In training based on Sparse R-CNN, the batch size is 16. The identical data augmentations used in Deformable DETR (Zhu et al., 2020) are used for multi-scale training, where the input image size is 480 800 with random crop and random horizontal flip. We use Adam W (Loshchilov & Hutter, 2017) optimizer with a weight decay of 5e-5 and a gradient clipping with an L2 norm of 1.0. We adopt the training schedule of 36 epochs with an initial learning rate of 5e-5, divided by a factor of 10 at the 27th and 33rd epoch, respectively.