E3SN: Efficient End-to-End Siamese Network for Video Object Segmentation

Authors: Meng Lan, Yipeng Zhang, Qinning Xu, Lefei Zhang

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on DAVIS2016 and DAVIS2017 datasets show that the proposed approach outperforms the Siam Mask in accuracy with similar FPS. Moreover, this approach also achieves good accuracy-speed trade-off compared with that of other state-of-the-art VOS algorithms.
Researcher Affiliation Academia Meng Lan , Yipeng Zhang , Qinning Xu and Lefei Zhang National Engineering Research Center for Multimedia Software, Institute of Artificial Intelligence and School of Computer Science, Wuhan University, Wuhan, China {menglan, zyp91, qinning.xu, zhanglefei}whu.edu.cn
Pseudocode No The paper describes the proposed method in prose and diagrams but does not include structured pseudocode or algorithm blocks.
Open Source Code No The paper does not include an unambiguous statement about releasing source code or provide a link to a code repository.
Open Datasets Yes The proposed model is trained on Image Net-VID [Russakovsky et al., 2015], COCO [Lin et al., 2014], and You Tube-VOS [Xu et al., 2018]. By contrast, only COCO and You Tube-VOS, which have mask labels, are useful for training the mask branch. NVIDIA TITAN RTX is the GPU used for training and evaluation.
Dataset Splits Yes The performance of E3SN is evaluated on DAVIS-2016 [Perazzi et al., 2016] and DAVIS-2017 [Pont-Tuset et al., 2017] validation sets for singleand multi-object segmentations, respectively. The DAVIS 2016 validation set comprises 20 videos, and each video sequence is annotated with a single pixel-wise object mask. The DAVIS 2017 validation set extends the DAVIS 2016 validation set to 30 videos with multiple object annotations.
Hardware Specification Yes NVIDIA TITAN RTX is the GPU used for training and evaluation.
Software Dependencies No The paper does not provide specific version numbers for software components or libraries used in the experiments.
Experiment Setup Yes The Image Net pre-trained model is loaded as the initial parameters of the backbone network, and SGD with an initial learning rate of 2 10 3 which logarithmically decreases to 2 10 4 in 20 epochs. Particularly, the learning rate of the mask branch is multiplied by 0.1. The two thresholds of sampling strategy are set to 0.6 and 0.3. (...) L = λ1 Lcls + λ2 Lreg + λ3 Lmask (3) where λ1 = λ2 = 1 and λ3 = 32 are set similar to that in Siam Mask.