Mask Propagation for Efficient Video Semantic Segmentation
Authors: Yuetian Weng, Mingfei Han, Haoyu He, Mingjie Li, Lina Yao, Xiaojun Chang, Bohan Zhuang
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on VSPW and Cityscapes demonstrate that our mask propagation framework achieves SOTA accuracy and efficiency trade-offs. |
| Researcher Affiliation | Collaboration | 1ZIP Lab, Monash University 2Baidu Inc. 3Re LER, AAII, UTS 4Data61, CSIRO 5Mohamed bin Zayed University of AI |
| Pseudocode | No | The paper describes the method using textual descriptions and equations, but does not provide a formal pseudocode block or algorithm. |
| Open Source Code | Yes | Code is available at https://github.com/ziplab/MPVSS. |
| Open Datasets | Yes | We evaluate our method on two benchmark datasets: VSPW [42] and Cityscapes [9]. |
| Dataset Splits | Yes | VSPW is the largest video semantic segmentation benchmark, consisting of 2,806 training clips (198,244 frames), 343 validation clips (24,502 frames), and 387 test clips (28,887 frames). |
| Hardware Specification | Yes | Frame-per-second (FPS) is measured on a single NVIDIA V100 GPU with 3 repeated runs. |
| Software Dependencies | No | The paper mentions software components like Mask2Former, Flow Net, and AdamW optimizer, but does not specify their version numbers or the versions of the programming languages/libraries used. |
| Experiment Setup | Yes | By default, all experiments are trained with a batch size of 16 on 8 NVIDIA GPUs. All the models are trained with the Adam W optimizer [41] for a maximum of 90k iterations and the polynomial learning rate decay schedule [4] with an initial learning rate of 5e-5. For our proposed models, we use 5 as the default key frame interval for comparison. |