Learning to Count via Unbalanced Optimal Transport
Authors: Zhiheng Ma, Xing Wei, Xiaopeng Hong, Hui Lin, Yunfeng Qiu, Yihong Gong2319-2327
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The quantitative and qualitative results illustrate that our method achieves state-of-the-art counting and localization performance. We conduct extensive experiments on the four largest crowd counting benchmarks to verify the effectiveness of the proposed method on both counting and localization tasks. |
| Researcher Affiliation | Academia | 1College of Artificial Intelligence, Xi an Jiaotong University 2School of Software Engineering, Xi an Jiaotong University 3School of Cyber Science and Engineering, Xi an Jiaotong University 4Research Center for Artificial Intelligence, Peng Cheng Laboratory |
| Pseudocode | Yes | Algorithm 1: Unbalanced Optimal Transport From Density Predictions to Point Annotations |
| Open Source Code | No | All the experiments are conducted on a single GPU card (Pascal Titan X), and our code is implemented with Pytorch. |
| Open Datasets | Yes | Shanghai Tech (Zhang et al. 2016), UCF-QNRF (Idrees et al. 2018), JHU-CROWD++ (Sindagi, Yasarla, and Patel 2020), NWPU-CROWD (Wang et al. 2020b), which are currently the largest and most diverse datasets, are used through out our experiments. |
| Dataset Splits | Yes | JHU-CROWD++ (Sindagi, Yasarla, and Patel 2020) consists of 4,372 images with 1.51 million annotations. There are 2,272 images for training, 500 images for validation, and 1,600 images for testing. NWPU-CROWD (Wang et al. 2020b) contains 5,109 images with 2.13 million annotations. There are 3,109 images for training, 500 images for validation, and 1,500 images for testing. |
| Hardware Specification | Yes | All the experiments are conducted on a single GPU card (Pascal Titan X), and our code is implemented with Pytorch. |
| Software Dependencies | No | Our code is implemented with Pytorch. |
| Experiment Setup | Yes | We adopt the same network structure (VGG-19 truncated at the last pooling layer) used in Bayesian loss (BL) (Ma et al. 2019). For optimization, we set the learning rate of L-BFGS and Adam to 1.0 and 10−5, respectively, and ε to 0.01. The principle to set ε is simple. A smaller ε leads to a closer approximation to the original UOT, but the convergence speed is slower. Therefore, ε should be as small as possible when the convergence rate is acceptable. We find that the convergence rate is too slow when ε <= 0.001, then we choose a moderate value, 0.01. Random crop and random horizontal flip are applied to augment input images. |