Gramformer: Learning Crowd Counting via Graph-Modulated Transformer

Authors: Hui Lin, Zhiheng Ma, Xiaopeng Hong, Qinnan Shangguan, Deyu Meng

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on four challenging crowd counting datasets have validated the competitiveness of the proposed method.
Researcher Affiliation Academia 1School of Mathematics and Statistics, Xi an Jiaotong University 2Shenzhen Institute of Advanced Technology, Chinese Academy of Science 3Harbin Institute of Technology 4Peng Cheng Laboratory
Pseudocode No The paper does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/Lora Lin H/Gramformer.
Open Datasets Yes We evaluate our crowd counting method and compare it with other state-of-the-art methods on four largest crowd counting benchmarks. They are widely used in recent papers and are described as follows. Shanghai Tech A (Zhang et al. 2016)... UCF-QNRF (Idrees et al. 2018)... NWPU-CROWD (Wang et al. 2020b)... JHU-CROWD++ (Sindagi, Yasarla, and Patel 2020)...
Dataset Splits Yes NWPU-CROWD (Wang et al. 2020b) contains 5,109 images and 2.13 million annotated instances... 3,109 images are used in the training set; 500 images are in the validation set; and the remaining 1,500 images are in the test set. JHU-CROWD++ (Sindagi, Yasarla, and Patel 2020) has a more complex context... 2,272 images are chosen for the training set; 500 images are for the validation set; and the rest 1,600 images for the test set.
Hardware Specification Yes All experiments are conducted with a single RTX 3080 GPU.
Software Dependencies No The paper mentions software components like 'VGG-19' and 'Adam algorithm' but does not specify version numbers for any programming languages, libraries, or frameworks.
Experiment Setup Yes We set the training batch size as 1 and crop images with a size of 512 512. As some images in Shanghai Tech A contain smaller resolution, the crop size for this dataset changes to 256 256. Random scaling of [0.75, 1.25] and horizontal flipping are also adopted to augment each training image. We use Adam algorithm (Kingma and Ba 2014) with a learning rate 10 5 to optimize the parameters. We set the percentage of nearest neighbors q as 30%, and the maximum in-degree bound m as 18. The number of transformer layers L is 2 and the loss weight λ is 0.1.