Learning to Count via Unbalanced Optimal Transport

Authors: Zhiheng Ma, Xing Wei, Xiaopeng Hong, Hui Lin, Yunfeng Qiu, Yihong Gong2319-2327

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The quantitative and qualitative results illustrate that our method achieves state-of-the-art counting and localization performance. We conduct extensive experiments on the four largest crowd counting benchmarks to verify the effectiveness of the proposed method on both counting and localization tasks.
Researcher Affiliation Academia 1College of Artificial Intelligence, Xi an Jiaotong University 2School of Software Engineering, Xi an Jiaotong University 3School of Cyber Science and Engineering, Xi an Jiaotong University 4Research Center for Artificial Intelligence, Peng Cheng Laboratory
Pseudocode Yes Algorithm 1: Unbalanced Optimal Transport From Density Predictions to Point Annotations
Open Source Code No All the experiments are conducted on a single GPU card (Pascal Titan X), and our code is implemented with Pytorch.
Open Datasets Yes Shanghai Tech (Zhang et al. 2016), UCF-QNRF (Idrees et al. 2018), JHU-CROWD++ (Sindagi, Yasarla, and Patel 2020), NWPU-CROWD (Wang et al. 2020b), which are currently the largest and most diverse datasets, are used through out our experiments.
Dataset Splits Yes JHU-CROWD++ (Sindagi, Yasarla, and Patel 2020) consists of 4,372 images with 1.51 million annotations. There are 2,272 images for training, 500 images for validation, and 1,600 images for testing. NWPU-CROWD (Wang et al. 2020b) contains 5,109 images with 2.13 million annotations. There are 3,109 images for training, 500 images for validation, and 1,500 images for testing.
Hardware Specification Yes All the experiments are conducted on a single GPU card (Pascal Titan X), and our code is implemented with Pytorch.
Software Dependencies No Our code is implemented with Pytorch.
Experiment Setup Yes We adopt the same network structure (VGG-19 truncated at the last pooling layer) used in Bayesian loss (BL) (Ma et al. 2019). For optimization, we set the learning rate of L-BFGS and Adam to 1.0 and 10−5, respectively, and ε to 0.01. The principle to set ε is simple. A smaller ε leads to a closer approximation to the original UOT, but the convergence speed is slower. Therefore, ε should be as small as possible when the convergence rate is acceptable. We find that the convergence rate is too slow when ε <= 0.001, then we choose a moderate value, 0.01. Random crop and random horizontal flip are applied to augment input images.