reproducibilityindex.ai

Query-Memory Re-Aggregation for Weakly-supervised Video Object Segmentation

Authors: Fanchao Lin, Hongtao Xie, Yan Li, Yongdong Zhang2038-2046

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on three benchmarks show that our method achieves the state-of-the-art performance in WVOS (e.g., an overall score of 84.7% on the DAVIS 2016 validation set).
Researcher Affiliation	Collaboration	1 School of Information Science and Technology, University of Science and Technology of China, Hefei, China 2 Beijing Kuaishou Technology Co., Ltd., Beijing, China
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described.
Open Datasets	Yes	We conduct experiments on three public datasets: the single-object DAVIS 2016 (Perazzi et al. 2016) dataset and the multi-object DAVIS 2017 (Pont-Tuset et al. 2017) and You Tube-VOS (Xu et al. 2018) datasets. For the pre-training on static data, we generate pairs of simulative video frames from the salient object segmentation datasets, i.e., DUTS (Wang et al. 2017), HKU-IS (Li and Yu 2015), MSRA (Cheng et al. 2014), and SOC (Fan et al. 2018).
Dataset Splits	Yes	DAVIS 2016 contains 50 videos, which are divided into a training set (30videos) and a validation set (20 videos). DAVIS 2017 is an extended dataset of DAVIS 2016, which has 60 videos for training and 30 videos for validation with multiple targets per video. You Tube-VOS is a large-scale dataset consists of 3471 training videos and 474 validation videos.
Hardware Specification	Yes	We train our network using the Adam algorithm with a ﬁxed learning rate of 1e-5 on four GTX 1080Ti GPUs, and the batch size is 16. During the inference, only the initial bounding box label is given and the prediction is made in a propagation-like way. Our method is evaluated on a computer with a single V100 GPU.
Software Dependencies	No	The paper mentions deep learning frameworks and optimizers but does not provide specific version numbers for ancillary software dependencies.
Experiment Setup	Yes	We train our network using the Adam algorithm with a ﬁxed learning rate of 1e-5 on four GTX 1080Ti GPUs, and the batch size is 16. For the encoders in our framework, both the query encoder fq and the memory encoder fm use the Res Net50 (He et al. 2016) till the 4-th stage as the backbone, but fm adds extra ﬁlters in the input layer so that it can take 4 channels (RGB frame and a bounding box map) as input. The loss function of the whole framework is: L = l(Bq, Bl) + l(Sq, Sl) + l(Sr q, Sl) where l is a combination of the dice loss and the cross-entropy loss, and the weight of dice loss is set to 0.1 by experience.