Crowd Counting using Deep Recurrent Spatial-Aware Network
Authors: Lingbo Liu, Hongjun Wang, Guanbin Li, Wanli Ouyang, Liang Lin
IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on four challenging benchmarks show the effectiveness of our approach. Specifically, comparing with the existing best-performing methods, we achieve an improvement of 12% on the largest dataset World Expo 10 and 22.8% on the most challenging dataset UCF CC 50. |
| Researcher Affiliation | Academia | 1 School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China 2 School of Electrical and Information Engineering, The University of Sydney, Sydney, Australia |
| Pseudocode | No | The paper provides architectural diagrams for its networks (Figure 2 and Figure 3) but does not include any formal pseudocode blocks or algorithms. |
| Open Source Code | No | The paper does not include any statements or links indicating that the source code for their methodology is open-sourced or publicly available. |
| Open Datasets | Yes | Shanghai Tech [Zhang et al., 2016]. This dataset contains 1,198 images of unconstrained scenes with a total of 330,165 annotated people. UCF CC 50 [Idrees et al., 2013]. As an extremely challenging benchmark, this dataset contains 50 annotated images of diverse scenes collected from the Internet. MALL [Chen et al., 2012]. This dataset was captured by a publicly accessible surveillance camera in a shopping mall with more challenging lighting conditions and glass surface reflections. World Expo 10 [Zhang et al., 2015]. This dataset contains 1,132 video sequences captured by 108 surveillance cameras during the Shanghai World Expo in 2010. |
| Dataset Splits | Yes | Following the standard protocol discussed in [Idrees et al., 2013], we split the dataset into five subsets and perform a five-fold crossvalidation. When training, we randomly crop some regions with a range of [0.5, 0.9] from the original images and resize them to 1024 768. The testing images are directly resized to the same resolution. ... Following the same setting as [Chen et al., 2012], we use the first 800 frames for training and the remaining 1,200 frames for evaluation. ... The training set consists of 3,380 annotated frames from 103 scenes, while the testing images are extracted from other five different scenes with 120 frames per scene. |
| Hardware Specification | No | The paper does not explicitly specify the hardware used for running the experiments (e.g., GPU models, CPU types, or memory specifications). It only mentions using 'TensorFlow' but not the underlying hardware. |
| Software Dependencies | Yes | We adopt the Tensor Flow [Abadi et al., 2016] toolbox to implement our crowd counting network. ... We optimize our networks parameters with Adam optimization [Kingma and Ba, 2014] by minimizing the loss function Eq.(8). |
| Experiment Setup | Yes | The filter weights of all convolutional layers and fullyconnected layers are initialized by truncated normal distribution with a deviation equal to 0.01. The learning rate is set to 10 4 initially and multiplied by 0.98 every 1K training iterations. The batch size is set to 1. We optimize our networks parameters with Adam optimization [Kingma and Ba, 2014] by minimizing the loss function Eq.(8). |