Crowd Scene Understanding with Coherent Recurrent Neural Networks

Authors: Hang Su, Yinpeng Dong, Jun Zhu, Haibin Ling, Bo Zhang

IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on hundreds of public crowd videos demonstrate that our method is state-of-the-art performance by exploring the coherent spatiotemporal structures in crowd behaviors.
Researcher Affiliation Academia Tsinghua National Lab for Information Science and Technology State Key Lab of Intelligent Technology and Systems Department of Computer Science and Technology, Tsinghua University, Beijing, China Department of Computer and Information Sciences, Temple University, USA
Pseudocode No The paper describes the LSTM unit operations with mathematical equations and diagrams, but does not include pseudocode or an algorithm block.
Open Source Code No The paper does not provide any explicit statement about releasing source code or a link to a code repository.
Open Datasets Yes Evaluations are conducted on the CUHK Crowd Dataset [Shao et al., 2014], which includes crowd videos with different densities and perspective scales in many environments, e.g., street, airports, etc.
Dataset Splits No The paper mentions training and testing splits, but does not explicitly state the use of a separate validation set or its proportion.
Hardware Specification No The paper does not specify any hardware details such as GPU models, CPU types, or memory used for experiments.
Software Dependencies No The paper mentions the use of LSTM and KLT tracker, but does not specify software versions for any libraries, frameworks, or programming languages.
Experiment Setup Yes In each experiment, we construct a coherent LSTM with 128 hidden units, such that the input tracklets are mapped to 128-dimensional hidden features. When optimizing the parameters in predicting the future paths, we divide each tracklet into two segments, and use the hidden features learnt from the first segments (e.g., 2/3 of each tracklet) to predict the latter segments (e.g., the rest 1/3 tracklet).