Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
E3SN: Efficient End-to-End Siamese Network for Video Object Segmentation
Authors: Meng Lan, Yipeng Zhang, Qinning Xu, Lefei Zhang
IJCAI 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on DAVIS2016 and DAVIS2017 datasets show that the proposed approach outperforms the Siam Mask in accuracy with similar FPS. Moreover, this approach also achieves good accuracy-speed trade-off compared with that of other state-of-the-art VOS algorithms. |
| Researcher Affiliation | Academia | Meng Lan , Yipeng Zhang , Qinning Xu and Lefei Zhang National Engineering Research Center for Multimedia Software, Institute of Artificial Intelligence and School of Computer Science, Wuhan University, Wuhan, China {menglan, zyp91, qinning.xu, zhanglefei}whu.edu.cn |
| Pseudocode | No | The paper describes the proposed method in prose and diagrams but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not include an unambiguous statement about releasing source code or provide a link to a code repository. |
| Open Datasets | Yes | The proposed model is trained on Image Net-VID [Russakovsky et al., 2015], COCO [Lin et al., 2014], and You Tube-VOS [Xu et al., 2018]. By contrast, only COCO and You Tube-VOS, which have mask labels, are useful for training the mask branch. NVIDIA TITAN RTX is the GPU used for training and evaluation. |
| Dataset Splits | Yes | The performance of E3SN is evaluated on DAVIS-2016 [Perazzi et al., 2016] and DAVIS-2017 [Pont-Tuset et al., 2017] validation sets for singleand multi-object segmentations, respectively. The DAVIS 2016 validation set comprises 20 videos, and each video sequence is annotated with a single pixel-wise object mask. The DAVIS 2017 validation set extends the DAVIS 2016 validation set to 30 videos with multiple object annotations. |
| Hardware Specification | Yes | NVIDIA TITAN RTX is the GPU used for training and evaluation. |
| Software Dependencies | No | The paper does not provide specific version numbers for software components or libraries used in the experiments. |
| Experiment Setup | Yes | The Image Net pre-trained model is loaded as the initial parameters of the backbone network, and SGD with an initial learning rate of 2 10 3 which logarithmically decreases to 2 10 4 in 20 epochs. Particularly, the learning rate of the mask branch is multiplied by 0.1. The two thresholds of sampling strategy are set to 0.6 and 0.3. (...) L = λ1 Lcls + λ2 Lreg + λ3 Lmask (3) where λ1 = λ2 = 1 and λ3 = 32 are set similar to that in Siam Mask. |