reproducibilityindex.ai

Learning to Count via Unbalanced Optimal Transport

Authors: Zhiheng Ma, Xing Wei, Xiaopeng Hong, Hui Lin, Yunfeng Qiu, Yihong Gong2319-2327

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The quantitative and qualitative results illustrate that our method achieves state-of-the-art counting and localization performance. We conduct extensive experiments on the four largest crowd counting benchmarks to verify the effectiveness of the proposed method on both counting and localization tasks.
Researcher Affiliation	Academia	1College of Artificial Intelligence, Xi an Jiaotong University 2School of Software Engineering, Xi an Jiaotong University 3School of Cyber Science and Engineering, Xi an Jiaotong University 4Research Center for Artificial Intelligence, Peng Cheng Laboratory
Pseudocode	Yes	Algorithm 1: Unbalanced Optimal Transport From Density Predictions to Point Annotations
Open Source Code	No	All the experiments are conducted on a single GPU card (Pascal Titan X), and our code is implemented with Pytorch.
Open Datasets	Yes	Shanghai Tech (Zhang et al. 2016), UCF-QNRF (Idrees et al. 2018), JHU-CROWD++ (Sindagi, Yasarla, and Patel 2020), NWPU-CROWD (Wang et al. 2020b), which are currently the largest and most diverse datasets, are used through out our experiments.
Dataset Splits	Yes	JHU-CROWD++ (Sindagi, Yasarla, and Patel 2020) consists of 4,372 images with 1.51 million annotations. There are 2,272 images for training, 500 images for validation, and 1,600 images for testing. NWPU-CROWD (Wang et al. 2020b) contains 5,109 images with 2.13 million annotations. There are 3,109 images for training, 500 images for validation, and 1,500 images for testing.
Hardware Specification	Yes	All the experiments are conducted on a single GPU card (Pascal Titan X), and our code is implemented with Pytorch.
Software Dependencies	No	Our code is implemented with Pytorch.
Experiment Setup	Yes	We adopt the same network structure (VGG-19 truncated at the last pooling layer) used in Bayesian loss (BL) (Ma et al. 2019). For optimization, we set the learning rate of L-BFGS and Adam to 1.0 and 10−5, respectively, and ε to 0.01. The principle to set ε is simple. A smaller ε leads to a closer approximation to the original UOT, but the convergence speed is slower. Therefore, ε should be as small as possible when the convergence rate is acceptable. We find that the convergence rate is too slow when ε <= 0.001, then we choose a moderate value, 0.01. Random crop and random horizontal flip are applied to augment input images.