Siamese Network with Interactive Transformer for Video Object Segmentation
Authors: Meng Lan, Jing Zhang, Fengxiang He, Lefei Zhang1228-1236
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on three challenging benchmarks validate the superiority of SITVOS over state-of-the-art methods. |
| Researcher Affiliation | Collaboration | Meng Lan1, Jing Zhang2, Fengxiang He3, Lefei Zhang1,4* 1 Wuhan University 2 The University of Sydney 3 JD Explore Academy, China 4 Hubei Luojia Laboratory |
| Pseudocode | No | The paper describes algorithms and architectures in prose and figures, but does not include any specific pseudocode blocks or algorithm listings. |
| Open Source Code | Yes | Code: https://github.com/LANMNG/SITVOS. |
| Open Datasets | Yes | MS-COCO dataset (Lin et al. 2014). ... DAVIS 2017 (Pont-Tuset et al. 2017) and You Tube-VOS (Xu et al. 2018). |
| Dataset Splits | Yes | DAVIS 2016-Val for single-object segmentation, DAVIS 2017-Val and You Tube-VOS validation sets for multi-object segmentation. |
| Hardware Specification | Yes | SITVOS is implemented in Pytorch and trained using RTX 2080Ti GPU. |
| Software Dependencies | No | SITVOS is implemented in Pytorch. The specific version number for Pytorch or any other software dependencies is not provided. |
| Experiment Setup | Yes | The input image size is 384 x 384 and batchsize is 4 for both training stages. We minimize the cross-entropy loss using the Adam optimizer with a learning rate starting at 1e-5. The learning rate is adjusted with polynomial scheduling using the power of 0.9. All batch normalization layers in the backbone are fixed as their Image Net pre-trained parameters during training. |