Delving into the Cyclic Mechanism in Semi-supervised Video Object Segmentation

Authors: Yuxi Li, Ning Xu, Jinlong Peng, John See, Weiyao Lin

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct comprehensive experiments on challenging benchmarks of DAVIS17 and Youtube-VOS, demonstrating that the cyclic mechanism is beneficial to segmentation quality.
Researcher Affiliation Collaboration Yuxi Li Shanghai Jiao Tong University Shanghai, China lyxok1@sjtu.edu.cn Ning Xu Adobe Research San Jose, CA nxu@adobe.com Jinlong Peng Tencent Youtu Lab Shanghai, China jeromepeng@tencent.com John See Multimedia University Selangor, Malaysia johnsee@mmu.edu.my Weiyao Lin Shanghai Jiao Tong University Shanghai, China wylin@sjtu.edu.cn
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not include an unambiguous statement that the authors are releasing the source code for their methodology, nor does it provide a direct link to a code repository.
Open Datasets Yes Datasets. We train and evaluate our method on two widely used benchmarks for semi-supervised video object segmentation, DAVIS17 [10] and Youtube-VOS [11].
Dataset Splits Yes DAVIS17 contains 120 video sequences in total with at most 10 objects in a video. The dataset is split into 60 sequences for training, 30 for validation and the other 30 for test. The Youtube-VOS is larger in scale and contains more object categories. There are a total of 3,471 video sequences for training and 474 videos for validation in this dataset with at most 12 objects in a video.
Hardware Specification Yes The training and inference procedures are deployed on an NVIDIA TITAN Xp GPU.
Software Dependencies No The paper mentions software components like Resnet50, Image Net, and Adam optimizer, but does not provide specific version numbers for these or other libraries/frameworks.
Experiment Setup Yes We set the hyperparameters as γ = 1.0, N = 10, K = 5, and M = 50. The network is trained with a batch size of 4 for 240 epochs in total and is optimized by the Adam optimizer [22] of learning rate 10 5 and β1 = 0.9, β2 = 0.999. In both training and inference stages, the input frames are resized to the resolution of 240 427.