Mask Propagation for Efficient Video Semantic Segmentation

Authors: Yuetian Weng, Mingfei Han, Haoyu He, Mingjie Li, Lina Yao, Xiaojun Chang, Bohan Zhuang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on VSPW and Cityscapes demonstrate that our mask propagation framework achieves SOTA accuracy and efficiency trade-offs.
Researcher Affiliation Collaboration 1ZIP Lab, Monash University 2Baidu Inc. 3Re LER, AAII, UTS 4Data61, CSIRO 5Mohamed bin Zayed University of AI
Pseudocode No The paper describes the method using textual descriptions and equations, but does not provide a formal pseudocode block or algorithm.
Open Source Code Yes Code is available at https://github.com/ziplab/MPVSS.
Open Datasets Yes We evaluate our method on two benchmark datasets: VSPW [42] and Cityscapes [9].
Dataset Splits Yes VSPW is the largest video semantic segmentation benchmark, consisting of 2,806 training clips (198,244 frames), 343 validation clips (24,502 frames), and 387 test clips (28,887 frames).
Hardware Specification Yes Frame-per-second (FPS) is measured on a single NVIDIA V100 GPU with 3 repeated runs.
Software Dependencies No The paper mentions software components like Mask2Former, Flow Net, and AdamW optimizer, but does not specify their version numbers or the versions of the programming languages/libraries used.
Experiment Setup Yes By default, all experiments are trained with a batch size of 16 on 8 NVIDIA GPUs. All the models are trained with the Adam W optimizer [41] for a maximum of 90k iterations and the polynomial learning rate decay schedule [4] with an initial learning rate of 5e-5. For our proposed models, we use 5 as the default key frame interval for comparison.