Pyramid Constrained Self-Attention Network for Fast Video Salient Object Detection
Authors: Yuchao Gu, Lijuan Wang, Ziqin Wang, Yun Liu, Ming-Ming Cheng, Shao-Ping Lu10869-10876
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results demonstrate that our method outperforms previous state-of-the-art methods in both accuracy and speed (110 FPS on a single Titan Xp) on five challenge datasets. We carry experiments on six challenge VSOD datasets and achieve new state-of-the-art results. |
| Researcher Affiliation | Academia | Yuchao Gu,1* Lijuan Wang,1 Ziqin Wang,2 Yun Liu,1 Ming-Ming Cheng,1 Shao-Ping Lu1 1TKLNDST, CS, Nankai University 2The University of Sydney |
| Pseudocode | No | The paper describes its proposed method using text and mathematical equations, but it does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/ guyuchao/Pyramid CSA. |
| Open Datasets | Yes | We remove the PCSA module and pretrain our backbone with the training set of an image dataset, i.e. DUTS (Wang et al. 2017) and two video datasets, i.e. DAVIS (Perazzi et al. 2016) and DAVSOD (Fan et al. 2019). We benchmark our method on six public VSOD datasets, i.e. FBMS (Ochs, Malik, and Brox 2013), DAVIS (Perazzi et al. 2016), DAVSOD (Fan et al. 2019), Seg Track V2 (Li et al. 2013), VOS (Li, Xia, and Chen 2017) and Vi Sal (Wang, Shen, and Shao 2015). |
| Dataset Splits | Yes | We remove the PCSA module and pretrain our backbone with the training set of an image dataset, i.e. DUTS (Wang et al. 2017) and two video datasets, i.e. DAVIS (Perazzi et al. 2016) and DAVSOD (Fan et al. 2019). |
| Hardware Specification | Yes | Extensive experimental results demonstrate that our method outperforms previous state-of-the-art methods in both accuracy and speed (110 FPS on a single Titan Xp) on five challenge datasets. Total training procedure takes 15 hours on 4 rtx 2080ti. Speed is tested on Intel(R) Core(TM) i74790K CPU and a single Titan Xp GPU. |
| Software Dependencies | No | Our model is built based on pytorch (Paszke et al. 2019) repository. (This mentions PyTorch but does not provide a version number, nor does it list other software dependencies with versions). |
| Experiment Setup | Yes | We use adam optimizer with initial learning rate 2e-4 and batch sizes 36. The learning rate decays with poly scheduler (decay rate=0.9). We resize input images to 256 448. The data augmentation methods contain randomly flip, randomly crop, and multi-scale training. We use five scales {0.5, 0.75, 1, 1.25, 1.75} when training. We set T = 5 and batch sizes 12 in our experiments. The initial learning rate of PCSA and the backbone is set to 10 4 and 10 6, respectively. |