Reliable Propagation-Correction Modulation for Video Object Segmentation

Authors: Xiaohao Xu, Jinglu Wang, Xiao Li, Yan Lu2946-2954

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our model achieves state-of-the-art performance on You Tube-VOS18/19 and DAVIS17-Val/Test benchmarks. Extensive experiments demonstrate that the correction mechanism provides considerable performance gain by fully utilizing reliable guidance. Experiment Datasets. We evaluate our model mainly on two widely used VOS benchmarks with multiple objects, You Tube-VOS (Xu et al. 2018a) and DAVIS17 (Pont-Tuset et al. 2017), and a small-scale single object VOS benchmark DAVIS16 (Perazzi et al. 2016).
Researcher Affiliation Collaboration Xiaohao Xu, 1,2* Jinglu Wang, 2 Xiao Li,2 Yan Lu 2 1 Huangzhong University of Science and Technology 2 Microsoft Research Asia xxh11102019@outlook.com, {jinglwa, xili11, yanlu}@microsoft.com
Pseudocode Yes Algorithm 1: Reliable object proxy augmentation
Open Source Code No Our code is implemented with Py Torch 1.4.1 and is partly leveraged from (Yang, Wei, and Yang 2020).
Open Datasets Yes We evaluate our model mainly on two widely used VOS benchmarks with multiple objects, You Tube-VOS (Xu et al. 2018a) and DAVIS17 (Pont-Tuset et al. 2017), and a small-scale single object VOS benchmark DAVIS16 (Perazzi et al. 2016).
Dataset Splits Yes We evaluate our model mainly on two widely used VOS benchmarks with multiple objects, You Tube-VOS (Xu et al. 2018a) and DAVIS17 (Pont-Tuset et al. 2017), and a small-scale single object VOS benchmark DAVIS16 (Perazzi et al. 2016). Table 1: Quantitative comparisons on You Tube-VOS. Subscript s and u denote scores in seen and unseen categories. denotes using external training datasets. Superscript MS and F denotes using multi-scale and flip testing in evaluation respectively. Methods You Tube-VOS 2018 Validation You Tube-VOS 2019 Validation Table 2: Quantitative comparisons on DAVIS. denotes using external training datasets besides You Tube VOS and DAVIS. Superscript FR denotes full-resolution testing. Otherwise, methods are all tested on 480p. DAVIS16 Validation (single object, easy) DAVIS17 Validation (multi-object, medium) DAVIS17 Test-dev (multi-object, hard)
Hardware Specification Yes All the experiments are performed on an NVIDIA DGX1 Linux workstation (OS: Ubuntu 16.04.4 LTS, GPU: 8 Tesla V100).
Software Dependencies Yes Our code is implemented with Py Torch 1.4.1 and is partly leveraged from (Yang, Wei, and Yang 2020).
Experiment Setup Yes The training is conducted with an SGD optimizer with a momentum of 0.9 using the cross-entropy loss. For You Tube-VOS experiments, we only use You Tube-VOS without any external datasets. We first use a learning rate of 0.02 for 200k steps with a batch size of 8, then change to a learning rate of 0.01 for another 200k steps. During inference, we restrict the long-edge of each frame to no more than 1040 pixels and apply a scale set of [1.0, 1.3, 1.5] for multiscale testing on You Tube-VOS.