Pyramid Constrained Self-Attention Network for Fast Video Salient Object Detection

Authors: Yuchao Gu, Lijuan Wang, Ziqin Wang, Yun Liu, Ming-Ming Cheng, Shao-Ping Lu10869-10876

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experimental results demonstrate that our method outperforms previous state-of-the-art methods in both accuracy and speed (110 FPS on a single Titan Xp) on five challenge datasets. We carry experiments on six challenge VSOD datasets and achieve new state-of-the-art results.
Researcher Affiliation Academia Yuchao Gu,1* Lijuan Wang,1 Ziqin Wang,2 Yun Liu,1 Ming-Ming Cheng,1 Shao-Ping Lu1 1TKLNDST, CS, Nankai University 2The University of Sydney
Pseudocode No The paper describes its proposed method using text and mathematical equations, but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at https://github.com/ guyuchao/Pyramid CSA.
Open Datasets Yes We remove the PCSA module and pretrain our backbone with the training set of an image dataset, i.e. DUTS (Wang et al. 2017) and two video datasets, i.e. DAVIS (Perazzi et al. 2016) and DAVSOD (Fan et al. 2019). We benchmark our method on six public VSOD datasets, i.e. FBMS (Ochs, Malik, and Brox 2013), DAVIS (Perazzi et al. 2016), DAVSOD (Fan et al. 2019), Seg Track V2 (Li et al. 2013), VOS (Li, Xia, and Chen 2017) and Vi Sal (Wang, Shen, and Shao 2015).
Dataset Splits Yes We remove the PCSA module and pretrain our backbone with the training set of an image dataset, i.e. DUTS (Wang et al. 2017) and two video datasets, i.e. DAVIS (Perazzi et al. 2016) and DAVSOD (Fan et al. 2019).
Hardware Specification Yes Extensive experimental results demonstrate that our method outperforms previous state-of-the-art methods in both accuracy and speed (110 FPS on a single Titan Xp) on five challenge datasets. Total training procedure takes 15 hours on 4 rtx 2080ti. Speed is tested on Intel(R) Core(TM) i74790K CPU and a single Titan Xp GPU.
Software Dependencies No Our model is built based on pytorch (Paszke et al. 2019) repository. (This mentions PyTorch but does not provide a version number, nor does it list other software dependencies with versions).
Experiment Setup Yes We use adam optimizer with initial learning rate 2e-4 and batch sizes 36. The learning rate decays with poly scheduler (decay rate=0.9). We resize input images to 256 448. The data augmentation methods contain randomly flip, randomly crop, and multi-scale training. We use five scales {0.5, 0.75, 1, 1.25, 1.75} when training. We set T = 5 and batch sizes 12 in our experiments. The initial learning rate of PCSA and the backbone is set to 10 4 and 10 6, respectively.