Gramformer: Learning Crowd Counting via Graph-Modulated Transformer
Authors: Hui Lin, Zhiheng Ma, Xiaopeng Hong, Qinnan Shangguan, Deyu Meng
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on four challenging crowd counting datasets have validated the competitiveness of the proposed method. |
| Researcher Affiliation | Academia | 1School of Mathematics and Statistics, Xi an Jiaotong University 2Shenzhen Institute of Advanced Technology, Chinese Academy of Science 3Harbin Institute of Technology 4Peng Cheng Laboratory |
| Pseudocode | No | The paper does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/Lora Lin H/Gramformer. |
| Open Datasets | Yes | We evaluate our crowd counting method and compare it with other state-of-the-art methods on four largest crowd counting benchmarks. They are widely used in recent papers and are described as follows. Shanghai Tech A (Zhang et al. 2016)... UCF-QNRF (Idrees et al. 2018)... NWPU-CROWD (Wang et al. 2020b)... JHU-CROWD++ (Sindagi, Yasarla, and Patel 2020)... |
| Dataset Splits | Yes | NWPU-CROWD (Wang et al. 2020b) contains 5,109 images and 2.13 million annotated instances... 3,109 images are used in the training set; 500 images are in the validation set; and the remaining 1,500 images are in the test set. JHU-CROWD++ (Sindagi, Yasarla, and Patel 2020) has a more complex context... 2,272 images are chosen for the training set; 500 images are for the validation set; and the rest 1,600 images for the test set. |
| Hardware Specification | Yes | All experiments are conducted with a single RTX 3080 GPU. |
| Software Dependencies | No | The paper mentions software components like 'VGG-19' and 'Adam algorithm' but does not specify version numbers for any programming languages, libraries, or frameworks. |
| Experiment Setup | Yes | We set the training batch size as 1 and crop images with a size of 512 512. As some images in Shanghai Tech A contain smaller resolution, the crop size for this dataset changes to 256 256. Random scaling of [0.75, 1.25] and horizontal flipping are also adopted to augment each training image. We use Adam algorithm (Kingma and Ba 2014) with a learning rate 10 5 to optimize the parameters. We set the percentage of nearest neighbors q as 30%, and the maximum in-degree bound m as 18. The number of transformer layers L is 2 and the loss weight λ is 0.1. |