Reliable Propagation-Correction Modulation for Video Object Segmentation
Authors: Xiaohao Xu, Jinglu Wang, Xiao Li, Yan Lu2946-2954
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our model achieves state-of-the-art performance on You Tube-VOS18/19 and DAVIS17-Val/Test benchmarks. Extensive experiments demonstrate that the correction mechanism provides considerable performance gain by fully utilizing reliable guidance. Experiment Datasets. We evaluate our model mainly on two widely used VOS benchmarks with multiple objects, You Tube-VOS (Xu et al. 2018a) and DAVIS17 (Pont-Tuset et al. 2017), and a small-scale single object VOS benchmark DAVIS16 (Perazzi et al. 2016). |
| Researcher Affiliation | Collaboration | Xiaohao Xu, 1,2* Jinglu Wang, 2 Xiao Li,2 Yan Lu 2 1 Huangzhong University of Science and Technology 2 Microsoft Research Asia xxh11102019@outlook.com, {jinglwa, xili11, yanlu}@microsoft.com |
| Pseudocode | Yes | Algorithm 1: Reliable object proxy augmentation |
| Open Source Code | No | Our code is implemented with Py Torch 1.4.1 and is partly leveraged from (Yang, Wei, and Yang 2020). |
| Open Datasets | Yes | We evaluate our model mainly on two widely used VOS benchmarks with multiple objects, You Tube-VOS (Xu et al. 2018a) and DAVIS17 (Pont-Tuset et al. 2017), and a small-scale single object VOS benchmark DAVIS16 (Perazzi et al. 2016). |
| Dataset Splits | Yes | We evaluate our model mainly on two widely used VOS benchmarks with multiple objects, You Tube-VOS (Xu et al. 2018a) and DAVIS17 (Pont-Tuset et al. 2017), and a small-scale single object VOS benchmark DAVIS16 (Perazzi et al. 2016). Table 1: Quantitative comparisons on You Tube-VOS. Subscript s and u denote scores in seen and unseen categories. denotes using external training datasets. Superscript MS and F denotes using multi-scale and flip testing in evaluation respectively. Methods You Tube-VOS 2018 Validation You Tube-VOS 2019 Validation Table 2: Quantitative comparisons on DAVIS. denotes using external training datasets besides You Tube VOS and DAVIS. Superscript FR denotes full-resolution testing. Otherwise, methods are all tested on 480p. DAVIS16 Validation (single object, easy) DAVIS17 Validation (multi-object, medium) DAVIS17 Test-dev (multi-object, hard) |
| Hardware Specification | Yes | All the experiments are performed on an NVIDIA DGX1 Linux workstation (OS: Ubuntu 16.04.4 LTS, GPU: 8 Tesla V100). |
| Software Dependencies | Yes | Our code is implemented with Py Torch 1.4.1 and is partly leveraged from (Yang, Wei, and Yang 2020). |
| Experiment Setup | Yes | The training is conducted with an SGD optimizer with a momentum of 0.9 using the cross-entropy loss. For You Tube-VOS experiments, we only use You Tube-VOS without any external datasets. We first use a learning rate of 0.02 for 200k steps with a batch size of 8, then change to a learning rate of 0.01 for another 200k steps. During inference, we restrict the long-edge of each frame to no more than 1040 pixels and apply a scale set of [1.0, 1.3, 1.5] for multiscale testing on You Tube-VOS. |