reproducibilityindex.ai

Learning Dynamic Query Combinations for Transformer-based Object Detection and Segmentation

Authors: Yiming Cui, Linjie Yang, Haichao Yu

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show the superior performance of our approach combined with a wide range of DETR-based models on MS COCO (Lin et al., 2014), City Scapes (Cordts et al., 2016) and You Tube-VIS (Yang et al., 2019b) benchmarks with multiple tasks, including object detection, instance segmentation, and panoptic segmentation.
Researcher Affiliation	Collaboration	1Department of Electrical and Computer Engineering, University of Florida, Gainesville, USA 2Byte Dance Inc., San Jose, USA.
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described in this paper.
Open Datasets	Yes	For the object detection task, we use MS COCO benchmark (Lin et al., 2014) for evaluation, which contains 118, 287 images for training and 5, 000 for validation.
Dataset Splits	Yes	For the object detection task, we use MS COCO benchmark (Lin et al., 2014) for evaluation, which contains 118, 287 images for training and 5, 000 for validation.
Hardware Specification	Yes	The training time is based on 8 NVIDIA A100 GPUs and the inference FPS is tested on a single TITAN RTX GPU.
Software Dependencies	No	The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup	Yes	The query ratio r used to generate the combination coefficients is set to 4 by default. β is set to be 1. θ is implemented as a two-layer MLP with Re LU as nonlinear activations. The output size of its first layer is 512, and that of the second layer is the length of W D in corresponding models. For detection models, we use 300 modulated queries and 1200 basic queries if not specified otherwise.