DIRL: Domain-Invariant Representation Learning for Generalizable Semantic Segmentation
Authors: Qi Xu, Liang Yao, Zhengkai Jiang, Guannan Jiang, Wenqing Chu, Wenhui Han, Wei Zhang, Chengjie Wang, Ying Tai2884-2892
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments over multiple domains generalizable segmentation tasks show the superiority of our approach to other methods. We implement our method in Pytorch (Paszke et al. 2019). We evaluate the proposed algorithm on two challenging and important unsupervised domain generalization tasks: GTAV {Cityscapes, BDD, Mapillary} and Cityscapes {BDD, Synthia, GTAV} which involve two synthetic datasets and three real datasets. First, it is observed that the baseline model does not perform well due to the huge domain bias. Then, directly adopting the feature sensitivity or adding the PGAM to re-calibrate the features will bring in a certain performance improvement but is limited by the unlearnable feature sensitivity and the absence of feature guidance. The addition of the Sensitivity Guidance loss provides the learnable attention weights for feature re-calibration which brings in the significant improvement in the model generalization performance, especially in the task: GTAV Cityscapes. Next, we further compare two feature whiting losses in the last two columns. As we can see, Guided Feature Whiting loss obtains better performance because the feature sensitivity to the domain-specific style is closely related to the features covariance sensitivity to the domain-specific style, which demonstrates the importance of introducing the feature sensitivity as the guidance for domain generalization. |
| Researcher Affiliation | Collaboration | Qi Xu1, Liang Yao2, Zhengkai Jiang2, Guannan Jiang3, Wenqing Chu2, Wenhui Han4, Wei Zhang3, Chengjie Wang2, Ying Tai2* 1Shanghai Jiao Tong University, Shanghai, China 2Tencent Youtu Lab, Shanghai, China 3Contemporary Amperex Technology Co., Limited, Shanghai, China 4Fudan University, Shanghai, China |
| Pseudocode | No | The paper does not contain a pseudocode block or algorithm formally labeled as such. |
| Open Source Code | No | The paper does not provide a link to open-source code or explicitly state that the code will be made publicly available. |
| Open Datasets | Yes | Synthetic Dataset. GTAV (Richter et al. 2016) is a large-scale dataset containing 24,966 high-resolution synthetic images... It has 19 object categories compatible with Cityscapes. Synthia (Ros et al. 2016) consists of 9,400 synthetic images with a resolution of 960 × 720, which shares 16 classes with the three target datasets. Real Dataset. Cityscapes (Cordts et al. 2016) is a large semantic segmentation dataset... BDD (Yu et al. 2020) is another real-world dataset... The last real-world dataset we adopt is Mapillary (Neuhold et al. 2017) consisting of 25,000 high-resolution images... |
| Dataset Splits | Yes | Cityscapes (Cordts et al. 2016) is a large semantic segmentation dataset, which is split into the training, validation, and testing parts with 2,975, 500 and 1,525 images respectively. GTAV (Richter et al. 2016)... It contains 12,403, 6,382, and 6,181 images of size 1914 × 1052 for training, validating, and testing respectively. BDD (Yu et al. 2020)... provides 7,000 images for training and 1,000 images for validating. |
| Hardware Specification | Yes | The testing is performed with the image size of 2048 x 1024 on NVIDIA V100 GPU. |
| Software Dependencies | No | The paper states: 'We implement our method in Pytorch (Paszke et al. 2019).' However, it only mentions PyTorch without a specific version number, nor does it list any other software dependencies with version numbers. |
| Experiment Setup | Yes | The optimizer is SGD with an initial learning rate of 0.01 and momentum of 0.9. Besides, we adopt the polynomial learning rate scheduling (Liu, Rabinovich, and Berg 2015) with the power of 0.9. We train all the models for 40K iterations with the batch size of 8. We adopt the color and positional augmentations such as color jittering, Gaussian blur, random cropping, random horizontal flipping, and random scaling with the range of [0.5, 2.0] to avoid the model overfitting. For the photo-metric transformation in SAPM, we apply color jittering and Gaussian blurring. As shown in IBN-Net, earlier layers tend to encode the style information. Therefore we add the instance normalization layer and PGAM after the first three convolution groups. We empirically set the weight α, λ1, λ2 as 0.3, 0.8, 0.6 to achieve the best performance. |