Complementary-View Multiple Human Tracking

Authors: Ruize Han, Wei Feng, Jiewen Zhao, Zicheng Niu, Yujun Zhang, Liang Wan, Song Wang10917-10924

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We collect a new dataset consisting of top- and horizontal-view video pairs for performance evaluation and the experimental results show the effectiveness of the proposed method.
Researcher Affiliation Academia 1College of Intelligence and Computing, Tianjin University, Tianjin 300350, China 2Key Research Center for Surface Monitoring and Analysis of Cultural Relics, SACH, China 3Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 29208, USA {han ruize, wfeng, zhaojw, niuzchina, yujunzhang, lwan}@tju.edu.cn, songwang@cec.sc.edu
Pseudocode Yes Algorithm 1: Complementary-View MOT:
Open Source Code Yes 1https://github.com/Han Ruize/CVMHT
Open Datasets No We do not find publicly available dataset with temporally synchronizing top-view and horizontal-view videos with ground-truth labeling for cross-view multiple object tracking. Therefore, we collect a new dataset by flying a drone with a camera to take top-view videos and mounting Go Pro over the head of a person to take the horizontal-view videos for performance evaluation. The paper's GitHub repository states, 'Dataset: We collected a new dataset for performance evaluation. Please contact hanruize2017@tju.edu.cn for the download link,' indicating it is not directly publicly available.
Dataset Splits No The paper describes the collection and annotation of its new dataset but does not specify training, validation, or test splits (e.g., percentages or sample counts).
Hardware Specification Yes We implement the main program in Matlab and on a desktop computer with an Intel Core i5 3.4GHz CPU, and the Siamese network for cross-view appearance similarity measurement is implemented on GPU.
Software Dependencies No The paper mentions software like Matlab, YOLOv3, and cplex but does not provide specific version numbers for any of them.
Experiment Setup Yes The pre-specified parameters w1, w2 and c0 are set to 0.3, 0.5 and 0.3, respectively. We use the general YOLOv3 (Redmon et al. 2016) detector to detect subjects in the form of bounding boxes in both top- and horizontal-view videos. For top-view subject detection, we fine-tune the network using 600 top-view human images. For training the Siamese based network, given a subject detected in the top-view frame, we use it paired with its corresponding subject in horizontal view as a positive sample, and paired with other subjects as a negative training sample.