Towards Cross-View Consistency in Semantic Segmentation While Varying View Direction

Authors: Xin Tong, Xianghua Ying, Yongjie Shi, He Zhao, Ruibin Wang

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct an ablation study on a widely adopted urban semantic segmentation dataset Cityscapes [Cordts et al., 2016] to verify the effectiveness of our method. [...] We evaluate the cross-view consistency using VVD dataset and the segmentation performance using both Cityscapes and VVD dataset. [...] The proposed method with all three modules gets the best performance in the comparison. Meanwhile, it obviously reduces the gap between the segmentation performance on VVD dataset and Cityscapes. For visual comparison, some segmentation examples are shown in Fig. 4. The proposed method improves the segmentation performance in vehicles and traffic signs compared with the baseline method. We also visualize the cross-view consistency results in Fig. 5. We gather the statistic of improvement for each class in Cityscapes. We get improvement in all 19 classes and present the highest 9 classes for clear visualization, as shown in Fig. 6.
Researcher Affiliation Academia Xin Tong , Xianghua Ying , Yongjie Shi , He Zhao and Ruibin Wang Key Laboratory of Machine Perception (MOE), School of EECS, Peking University {xin tong, xhying, shiyongjie, zhaohe97, robin wang}@pku.edu.cn
Pseudocode No The paper describes the proposed algorithms and modules in text and diagrams, but does not include any explicit pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statement about releasing source code or a link to a code repository.
Open Datasets Yes We conduct an ablation study on a widely adopted urban semantic segmentation dataset Cityscapes [Cordts et al., 2016] to verify the effectiveness of our method. [...] We also evaluate our method in Cam Vid dataset. Cam Vid dataset contains 701 images and their pixel-level segmentation annotation with size of 720 960. We use 468 images to train and 233 images to validate following [Badrinarayanan et al., 2017].
Dataset Splits Yes Cityscapes dataset contains 5000 high quality pixel-level finely annotated images including 2975 images for training and 500 for validation. [...] Cam Vid dataset contains 701 images and their pixel-level segmentation annotation with size of 720 960. We use 468 images to train and 233 images to validate following [Badrinarayanan et al., 2017].
Hardware Specification No The paper mentions "Due to GPU memory limitations" but does not specify any particular GPU models, CPU models, or other hardware components used for the experiments.
Software Dependencies No The paper states "Our training and evaluation is implemented in Py Torch." but does not provide a version number for PyTorch or any other software dependencies.
Experiment Setup Yes Resnet50 with the dilated network strategy is used as our backbone. [...] For training, we use SGD optimizer and employ the polynomial learning rate policy [Chen et al., 2017a; Liu et al., 2015] where current learning rate equals to the initial one multiplying (1 iter max iter)power. The initial learning rate and the power are set to 0.01 and 0.9, while the momentum and weight decay are set to 0.9 and 0.0001 respectively. Due to GPU memory limitations, we use a batch size of 8 and crop size of 776 during training. [...] The total iterations are set to 90K to guarantee the fairness of comparison. [...] The yaw, pitch and roll angles used in homography generation are set in range of [ 30, 30], [ 15, 15] and [ 3, 3] respectively. The focal length f is set to 2262 following [Godard et al., 2017].