Self-Ensembling Attention Networks: Addressing Domain Shift for Semantic Segmentation

Authors: Yonghao Xu, Bo Du, Lefei Zhang, Qian Zhang, Guoli Wang, Liangpei Zhang5581-5588

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on two benchmark datasets demonstrate that the proposed framework can yield competitive performance compared with the state of the art methods.
Researcher Affiliation Collaboration 1School of Computer Science, Wuhan University, Wuhan 430072, P. R. China. 2State Key Laboratory of Information Engineering in Surveying, Mapping, and Remote Sensing, Wuhan University, Wuhan 430079, P. R. China. 3Horizon Robotics, Inc., Beijing 100190, P. R. China.
Pseudocode No The paper describes the proposed model and its components in prose and with a diagram (Figure 2), but does not include any pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statement or link for the open-sourcing of the code for the described methodology.
Open Datasets Yes We use the CITYSCAPES (Cordts et al. 2016) as our target-domain data in the experiments. For the source domain, two challenging synthetic datasets including SYNTHIA (Ros et al. 2016) and GTA-5 (Richter et al. 2016) are utilized.
Dataset Splits Yes CITYSCAPES is a real-world vehicle-egocentric image dataset collected from 50 cities in Germany and the countries around. It provides three disjoint subsets: 2975 training images, 500 validation images, and 1525 test images. In the test phase, we evaluate on the CITYSCAPES validation set with 500 images.
Hardware Specification Yes The experiments in this paper are implemented in Py Torch with a single NVIDIA GTX TITAN X GPU.
Software Dependencies No The paper mentions implementing the experiments in 'Py Torch' but does not specify a version number for PyTorch or any other software dependencies.
Experiment Setup Yes The Adam optimizer (Kinga and Adam 2015) with a learning rate of 1e 5 and weight decay of 5e 5 is utilized to train the proposed networks. Each mini-batch consists of 1 source-domain image and 1 target-domain image. We resize all the images to the size of 1024 512. The smoothing coefficient α in the exponential moving average is empirically set as 0.99 in our experiments.