Joint Semantic-geometric Learning for Polygonal Building Segmentation

Authors: Weijia Li, Wenqian Zhao, Huaping Zhong, Conghui He, Dahua Lin1958-1965

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Results on two popular building segmentation datasets demonstrate that our approach achieves significant improvements for both building instance segmentation (with 2% F1-score gain) and polygon vertex prediction (with 6% F1-score gain) compared with current state-of-the-art methods.
Researcher Affiliation Collaboration 1CUHK-Sense Time Joint Lab, The Chinese University of Hong Kong 2Shanghai Sense Time Intelligent Technology Co., Ltd. 3The Chinese University of Hong Kong 4Sense Time Group Limited 5Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences {wjli,dhlin}@ie.cuhk.edu.hk, wqzhao@cse.cuhk.edu.hk, {zhonghuaping,heconghui}@senstime.com
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not include an explicit statement about releasing source code for the methodology or a link to a code repository.
Open Datasets Yes Following previous polygonal building segmentation studies (Zhao et al. 2018; Li, Wegner, and Lucchi 2019), we evaluate our proposed method using two popular building datasets: (1) Crowd AI mapping challenge dataset (Crowd AI) (Mohanty 2018). (2) Space Net building footprint dataset (Space Net) (Van Etten, Lindenbaum, and Bacastow 2018).
Dataset Splits Yes The dataset of Las Vegas contains 3,851 images (in 650 650 pixels) and around 10,8000 building instances, which are randomly divided into 3,081/385/385 images as the training/validation/test datasets.
Hardware Specification No The paper does not specify the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper does not provide specific version numbers for software dependencies or libraries used in the experiments.
Experiment Setup Yes The weights of three tasks (λ1, λ2, λ3) are all set as 1. For the vertex generation module, the corner probability threshold Tcor is set as 0.5 and the orientation difference threshold Tori is set as 20 . For the polygon refinement network, each image cropped by the bounding box is resized to 224 224 pixels following (Ling et al. 2019). For the Res Net-based backbone, the size of the final feature map for vertex embedding is 112 112 256. For the GGNN propagation model, the dimension sizes of the two fully-connected layers and the output layer are 256, 256 and 15 15, indicating that the relative moving range of each vertex is [-7,+7] pixels.