Exploit Domain-Robust Optical Flow in Domain Adaptive Video Semantic Segmentation
Authors: Yuan Gao, Zilei Wang, Jiafan Zhuang, Yixin Zhang, Junjie Li
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The extensive experiments on two challenging benchmarks demonstrate the effectiveness of our method, and it outperforms previous state-of-the-art methods with considerable performance improvement. Our code is available at https://github.com/EdenHazardan/SFC. |
| Researcher Affiliation | Academia | Yuan Gao1, Zilei Wang*1 , Jiafan Zhuang2, Yixin Zhang1,3, Junjie Li1 1 University of Science and Technology of China 2 Shantou University 3 Institute of Artificial Intelligence, Hefei Comprehensive National Science Center |
| Pseudocode | No | The paper describes methods via text and architectural diagrams (e.g., Figure 1, Figure 4) but does not provide formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/EdenHazardan/SFC. |
| Open Datasets | Yes | Following DA-VSN (Guan et al. 2021) and TPS (Xing et al. 2022), our experiments involve two challenging synthetic-to-real benchmarks: VIPER Cityscapes-Seq and SYNTHIA-Seq Cityscapes-Seq. Cityscapes-Seq (Cordts et al. 2016) is a representative dataset in semantic segmentation and autonomous driving domain. We use it as the target domain dataset without using any annotations during training. VIPER (Richter, Hayder, and Koltun 2017) is a synthetic dataset... SYNTHIA-Seq (Ros et al. 2016) is also a synthetic dataset... |
| Dataset Splits | Yes | The training and validation subsets contain 2, 975 and 500 videos, respectively, and each video contains 30 frames at a resolution of 1024 2048. |
| Hardware Specification | No | We acknowledge the support of GPU cluster built by MCC Lab of Information Science and Technology Institution, USTC. |
| Software Dependencies | No | We adopt Accel (Jain, Wang, and Gonzalez 2019) throughout experiments. It consists of two segmentation branches, an optical flow network, and a score fusion layer. Two segmentation branches are used to generate semantic predictions on consecutive frames using Deeplab (Chen et al. 2017), whose backbones are both Res Net-101 (He et al. 2016) pretrained on Image Net (Deng et al. 2009). Flow Net (Dosovitskiy et al. 2015) is adopted as an optical flow network to propagate prediction from the previous frame, which is pretrained on Flying Chairs dataset (Dosovitskiy et al. 2015). |
| Experiment Setup | Yes | Implementation details As in DA-VSN (Guan et al. 2021) and TPS (Xing et al. 2022), we adopt Accel (Jain, Wang, and Gonzalez 2019) throughout experiments... uses an SGD optimizer with a momentum of 0.9 and a weight decay of 5 10 4. The learning rate is set at 2.5 10 4 for backbone parameters and 2.5 10 3 for others, which is annealed following the poly learning rate policy... We set λf as 0.005 and 0.001 in two training stages respectively, while set λs = 100 in both stages. |