Learning Content-Enhanced Mask Transformer for Domain Generalized Urban-Scene Segmentation

Authors: Qi Bi, Shaodi You, Theo Gevers

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments conducted on various domain-generalized urban-scene segmentation datasets demonstrate that the proposed CMFormer significantly outperforms existing CNN-based methods by up to 14.0% m Io U and the contemporary HGFormer by up to 1.7% m Io U.
Researcher Affiliation Academia Qi Bi, Shaodi You, Theo Gevers Computer Vision Research Group, University of Amsterdam, Netherlands {q.bi, s.you, th.gevers}@uva.nl
Pseudocode No The paper describes its mechanisms using mathematical formulas and textual descriptions but does not present a formal 'Pseudocode' or 'Algorithm' block/figure.
Open Source Code Yes The source code is publicly available at https: //github.com/Bi Qi WHU/CMFormer.
Open Datasets Yes Building upon prior research in domain-generalized USSS, our experiments utilize five different semantic segmentation datasets. Specifically, City Scapes (Cordts et al. 2016) provides 2,975 and 500 well-annotated samples for training and validation, respectively. BDD100K (Yu et al. 2018)... Mapillary (Neuhold et al. 2017)... SYNTHIA (Ros et al. 2016)... GTA5 (Richter et al. 2016)...
Dataset Splits Yes Specifically, City Scapes (Cordts et al. 2016) provides 2,975 and 500 well-annotated samples for training and validation, respectively. BDD100K (Yu et al. 2018) also provides diverse urban driving scenes with a resolution of 1280 720. 7,000 and 1,000 fine-annotated samples are provided for training and validation of semantic segmentation, respectively.
Hardware Specification No The paper mentions using a Swin Transformer backbone but does not specify any particular GPU model, CPU, or other specific hardware used for training or inference in its experiments.
Software Dependencies No The paper mentions using Mask Former and Mask2Former as foundational frameworks and an Adam optimizer, but it does not specify version numbers for these or other software dependencies like PyTorch or Python.
Experiment Setup Yes Following the default setting of Mask Former (Cheng, Schwing, and Kirillov 2021) and Mask2Former (Cheng et al. 2022), the final loss function L is a linear combination of the binary cross-entropy loss Lce, dice loss Ldice, and the classification loss Lcls, given by L = λce Lce + λdice Ldice + λcls Lcls, (9) with hyper-parameters λce = λdice = 5.0, λcls = 2.0 as the default of Mask2Former without any tuning. The Adam optimizer is used with an initial learning rate of 1 10 4. The weight decay is set 0.05. The training terminates after 50 epochs.