E3SN: Efficient End-to-End Siamese Network for Video Object Segmentation
Authors: Meng Lan, Yipeng Zhang, Qinning Xu, Lefei Zhang
IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on DAVIS2016 and DAVIS2017 datasets show that the proposed approach outperforms the Siam Mask in accuracy with similar FPS. Moreover, this approach also achieves good accuracy-speed trade-off compared with that of other state-of-the-art VOS algorithms. |
| Researcher Affiliation | Academia | Meng Lan , Yipeng Zhang , Qinning Xu and Lefei Zhang National Engineering Research Center for Multimedia Software, Institute of Artificial Intelligence and School of Computer Science, Wuhan University, Wuhan, China {menglan, zyp91, qinning.xu, zhanglefei}whu.edu.cn |
| Pseudocode | No | The paper describes the proposed method in prose and diagrams but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not include an unambiguous statement about releasing source code or provide a link to a code repository. |
| Open Datasets | Yes | The proposed model is trained on Image Net-VID [Russakovsky et al., 2015], COCO [Lin et al., 2014], and You Tube-VOS [Xu et al., 2018]. By contrast, only COCO and You Tube-VOS, which have mask labels, are useful for training the mask branch. NVIDIA TITAN RTX is the GPU used for training and evaluation. |
| Dataset Splits | Yes | The performance of E3SN is evaluated on DAVIS-2016 [Perazzi et al., 2016] and DAVIS-2017 [Pont-Tuset et al., 2017] validation sets for singleand multi-object segmentations, respectively. The DAVIS 2016 validation set comprises 20 videos, and each video sequence is annotated with a single pixel-wise object mask. The DAVIS 2017 validation set extends the DAVIS 2016 validation set to 30 videos with multiple object annotations. |
| Hardware Specification | Yes | NVIDIA TITAN RTX is the GPU used for training and evaluation. |
| Software Dependencies | No | The paper does not provide specific version numbers for software components or libraries used in the experiments. |
| Experiment Setup | Yes | The Image Net pre-trained model is loaded as the initial parameters of the backbone network, and SGD with an initial learning rate of 2 10 3 which logarithmically decreases to 2 10 4 in 20 epochs. Particularly, the learning rate of the mask branch is multiplied by 0.1. The two thresholds of sampling strategy are set to 0.6 and 0.3. (...) L = λ1 Lcls + λ2 Lreg + λ3 Lmask (3) where λ1 = λ2 = 1 and λ3 = 32 are set similar to that in Siam Mask. |