QueryProp: Object Query Propagation for High-Performance Video Object Detection

Authors: Fei He, Naiyu Gao, Jian Jia, Xin Zhao, Kaiqi Huang834-842

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on the ImageNet VID dataset. Query Prop achieves comparable accuracy with state-of-the-art methods and strikes a decent accuracy/speed trade-off.
Researcher Affiliation Academia 1CRISE, Institute of Automation, Chinese Academy of Sciences 2School of Artificial Intelligence, University of Chinese Academy of Sciences 3CAS Center for Excellence in Brain Science and Intelligence Technology {hefei2018, gaonaiyu2017, jiajian2018}@ia.ac.cn, {xzhao, kaiqi.huang}@nlpr.ia.ac.cn
Pseudocode No The paper does not include any explicitly labeled "Pseudocode" or "Algorithm" blocks.
Open Source Code No The paper does not provide any explicit statements or links indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We conduct extensive experiments on the Image Net VID dataset. Following the setting in (Zhu et al. 2017a), both Image Net VID and Image Net DET (Deng et al. 2009) are utilized to train our model. We use the parameters pre-trained on COCO (Lin et al. 2014) for model initialization.
Dataset Splits Yes We report mean Average Precision (m AP) on the validation set as the evaluation metric. We evaluate our model on the Image Net VID (Deng et al. 2009), which consists of 3862 training videos and 555 validation videos from 30 object categories.
Hardware Specification Yes All methods are tested on a TITAN RTX GPU. Without special marking, the runtime is tested on a TITAN RTX GPU. X means TITAN X, and V means TITAN V.
Software Dependencies Yes The proposed framework is implemented with Py Torch-1.7.
Experiment Setup Yes Query Prop utilizes Adam W (Loshchilov and Hutter 2018) optimizer with weight decay 0.0001. The whole framework is trained with 8 GPUs and each GPU holds one mini-batch. The training iteration is set to 90k and the initial learning rate is set to 2.5e-5, divided by 10 at iteration 65k and 80k, respectively. The initial learning rate is set to 10e-4 and the total training iteration is 16k, and the learning rate is dropped after iteration 8k and 12k. The number of queries and boxes in the detection heads is 100 by default.