Crowd Scene Understanding with Coherent Recurrent Neural Networks
Authors: Hang Su, Yinpeng Dong, Jun Zhu, Haibin Ling, Bo Zhang
IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on hundreds of public crowd videos demonstrate that our method is state-of-the-art performance by exploring the coherent spatiotemporal structures in crowd behaviors. |
| Researcher Affiliation | Academia | Tsinghua National Lab for Information Science and Technology State Key Lab of Intelligent Technology and Systems Department of Computer Science and Technology, Tsinghua University, Beijing, China Department of Computer and Information Sciences, Temple University, USA |
| Pseudocode | No | The paper describes the LSTM unit operations with mathematical equations and diagrams, but does not include pseudocode or an algorithm block. |
| Open Source Code | No | The paper does not provide any explicit statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | Evaluations are conducted on the CUHK Crowd Dataset [Shao et al., 2014], which includes crowd videos with different densities and perspective scales in many environments, e.g., street, airports, etc. |
| Dataset Splits | No | The paper mentions training and testing splits, but does not explicitly state the use of a separate validation set or its proportion. |
| Hardware Specification | No | The paper does not specify any hardware details such as GPU models, CPU types, or memory used for experiments. |
| Software Dependencies | No | The paper mentions the use of LSTM and KLT tracker, but does not specify software versions for any libraries, frameworks, or programming languages. |
| Experiment Setup | Yes | In each experiment, we construct a coherent LSTM with 128 hidden units, such that the input tracklets are mapped to 128-dimensional hidden features. When optimizing the parameters in predicting the future paths, we divide each tracklet into two segments, and use the hidden features learnt from the first segments (e.g., 2/3 of each tracklet) to predict the latter segments (e.g., the rest 1/3 tracklet). |