Video Object Detection with Locally-Weighted Deformable Neighbors

Authors: Zhengkai Jiang, Peng Gao, Chaoxu Guo, Qian Zhang, Shiming Xiang, Chunhong Pan8529-8536

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on VID dataset demonstrate that our method achieves superior performance in a speed and accuracy trade-off, i.e., 76.3% on the challenging VID dataset while maintaining 20fps in speed on Titan X GPU.
Researcher Affiliation Collaboration 1National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences 2School of Artificial Intelligence, University of Chinese Academy of Sciences 3The Chinese University of Hong Kong 4Horizon Robotics {zhengkai.jiang,chaoxu.guo,smxiang,chpan}@nlpr.ac.cn penggao@ee.cuhk.edu.hk qian01.zhang@hobot.ai
Pseudocode Yes Algorithm 1: Inference algorithm of Memory-Guided Propagation Networks
Open Source Code No No explicit statement or link providing access to open-source code for the described methodology was found.
Open Datasets Yes We evaluate the proposed method on the Image Net VID dataset which has been treated as a benchmark for video object detection (Russakovsky et al. 2015). [...] Thus we follow previous approaches and train our model on an intersection of Image Net VID and DET dataset.
Dataset Splits Yes VID dataset is split into 3862 training videos and 555 validation videos.
Hardware Specification Yes Extensive experiments on VID dataset demonstrate that our method achieves superior performance in a speed and accuracy trade-off, i.e., 76.3% on the challenging VID dataset while maintaining 20fps in speed on Titan X GPU. [...] For training, 4 epochs with SGD optimization method are performed on 8 GPUs with each GPU holding one mini-batch.
Software Dependencies No No specific version numbers for software dependencies (e.g., libraries, frameworks) were mentioned.
Experiment Setup Yes For training, 4 epochs with SGD optimization method are performed on 8 GPUs with each GPU holding one mini-batch. Learning rate begins with 2.5e-4 and divides by 10 after 2.5 epochs. We also employ standard left-right flipping augmentation.