Parsing All Adverse Scenes: Severity-Aware Semantic Segmentation with Mask-Enhanced Cross-Domain Consistency

Authors: Fuhao Li, Ziyang Gong, Yupeng Deng, Xianzheng Ma, Renrui Zhang, Zhenming Ji, Xiangwei Zhu, Hong Zhang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our unified framework, named PASS (Parsing All adver Se Scenes), achieves significant performance improvements over state-of-the-art methods on widely used benchmarks for all adverse scenes. Notably, the performance of PASS is superior to Semi-Unified models and even surpasses weather-specific models. Experiments Datasets Cityscapes (Cordts et al. 2016) Cityscapes (CS) contain 2,975 for training, 500 for validation, and 1,525 for testing of driving scenes. Comparison Experiment Settings Implementation details The default implementation of our methods is based on HRDA (Hoyer, Dai, and Van Gool 2022b), and follows the HRDA-based implementation of the teacher-student self-training framework of DAFormer (Hoyer, Dai, and Van Gool 2022a), which includes feature distance loss, confidence-weighted pseudo labels (τ = 0.968), rare class sampling, and Class Mix following DACS (Tranheden et al. 2021). Performance Experiments Analysis We present the state-of-the-art performance of PASS in Table 1. The results in foggy scenes demonstrate that our PASS surpasses previous fog-specialized models and Semi Unified models across three foggy benchmarks. Cross-domain Consistency Analysis Ablation Study Analysis We conduct ablation studies comparing the original SA with our MSA, which includes SA and Image Merging, as illustrated in Fig 5.
Researcher Affiliation Collaboration 1Wuhan University of Science and Technology 2Sun Yat-Sen University 3National University of Singapore 4University of Oxford 5Shanghai AI Lab
Pseudocode No The paper describes the training flow in Figure 2 and in the text, but it does not present a formal pseudocode block or algorithm block.
Open Source Code No The paper does not contain an explicit statement about releasing source code or a link to a code repository.
Open Datasets Yes Datasets Cityscapes (Cordts et al. 2016) Cityscapes (CS) contain 2,975 for training, 500 for validation, and 1,525 for testing of driving scenes captured in 50 urban areas. ACDC (Sakaridis, Dai, and Van Gool 2021) ACDC comprises four adverse scenes: fog, night, rain, and snow. Foggy Zurich (Sakaridis et al. 2018) Foggy Zurich (FZ) contains 1,522 images in the light fog and 1,498 images in the medium fog. Foggy Driving (Sakaridis et al. 2018) Foggy Driving (FD) comprises 101 real-world scenarios of foggy road conditions. Dark Zurich (Sakaridis, Dai, and Gool 2019) Dark Zurich (DZ) comprises 8,779 images captured during nighttime, twilight, and daytime. Nighttime Driving (Dai and Van Gool 2018) Nighttime Driving (ND) contains 50 nighttime images with coarsely annotated ground truth. BDD100K-Night (Yu et al. 2020) BDD100K-Night (BD) is a subset of the BDD100K segmentation dataset, consisting of 87 nighttime images with accurate segmentation labels.
Dataset Splits Yes Cityscapes (Cordts et al. 2016) Cityscapes (CS) contain 2,975 for training, 500 for validation, and 1,525 for testing of driving scenes captured in 50 urban areas. ACDC (Sakaridis, Dai, and Van Gool 2021) ACDC comprises four adverse scenes: fog, night, rain, and snow. For every scene, there are 400 images for training, 100 images for validation (including 106 night images), and 500 images for testing.
Hardware Specification No The paper does not specify any particular hardware used for running the experiments, such as GPU models, CPU models, or memory details.
Software Dependencies No The paper mentions software components like 'Adam W', 'HRDA', 'DAFormer', and 'DACS' but does not provide specific version numbers for these or other libraries.
Experiment Setup Yes Implementation details The default implementation of our methods is based on HRDA (Hoyer, Dai, and Van Gool 2022b), and follows the HRDA-based implementation of the teacher-student self-training framework of DAFormer (Hoyer, Dai, and Van Gool 2022a), which includes feature distance loss, confidence-weighted pseudo labels (τ = 0.968), rare class sampling, and Class Mix following DACS (Tranheden et al. 2021). The optimizer used is Adam W (Loshchilov and Hutter 2017), with a learning rate of 6 10 5 for the encoder and 6 10 4 for the decoder, and a linear learning rate warm-up. Regarding the resolution setup details, we follow the default configuration and parameters of HRDA. Unless otherwise stated, the MSA parameters are set to ηs = 0.5 and ηc = 0.5. For the Mask Operation of SPM, we set the min and max: ( 1.6, 1.6) for brightness, ( 15, 15) for hue, (0.8, 1.2) for contrast, and (0, 50) for noise. we also set Ni = 24, and v = 40 for the Severity Perception.