Siamese Network with Interactive Transformer for Video Object Segmentation

Authors: Meng Lan, Jing Zhang, Fengxiang He, Lefei Zhang1228-1236

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on three challenging benchmarks validate the superiority of SITVOS over state-of-the-art methods.
Researcher Affiliation Collaboration Meng Lan1, Jing Zhang2, Fengxiang He3, Lefei Zhang1,4* 1 Wuhan University 2 The University of Sydney 3 JD Explore Academy, China 4 Hubei Luojia Laboratory
Pseudocode No The paper describes algorithms and architectures in prose and figures, but does not include any specific pseudocode blocks or algorithm listings.
Open Source Code Yes Code: https://github.com/LANMNG/SITVOS.
Open Datasets Yes MS-COCO dataset (Lin et al. 2014). ... DAVIS 2017 (Pont-Tuset et al. 2017) and You Tube-VOS (Xu et al. 2018).
Dataset Splits Yes DAVIS 2016-Val for single-object segmentation, DAVIS 2017-Val and You Tube-VOS validation sets for multi-object segmentation.
Hardware Specification Yes SITVOS is implemented in Pytorch and trained using RTX 2080Ti GPU.
Software Dependencies No SITVOS is implemented in Pytorch. The specific version number for Pytorch or any other software dependencies is not provided.
Experiment Setup Yes The input image size is 384 x 384 and batchsize is 4 for both training stages. We minimize the cross-entropy loss using the Adam optimizer with a learning rate starting at 1e-5. The learning rate is adjusted with polynomial scheduling using the power of 0.9. All batch normalization layers in the backbone are fixed as their Image Net pre-trained parameters during training.