Global Context-Aware Progressive Aggregation Network for Salient Object Detection

Authors: Zuyao Chen, Qianqian Xu, Runmin Cong, Qingming Huang10599-10606

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on six benchmark datasets demonstrate that the proposed approach outperforms the state-of-the-art methods both quantitatively and qualitatively.
Researcher Affiliation Academia 1University of Chinese Academy of Sciences, Beijing, China 2Key Lab. of Intelligent Information Processing, ICT, CAS, Beijing, China 3Institute of Information Science, Beijing Jiaotong University, Beijing, China 4Key Lab. of Big Data Mining and Knowledge Management, CAS, Beijing, China 5Peng Cheng Laboratory, Shenzhen, Guangdong, China
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes The code is now available. 1https://github.com/Joseph Chen Hub/GCPANet.git
Open Datasets Yes We conduct experiments on six public saliency detection benchmark datasets, and the detailed introduction is provided as follows: ECSSD (Yan et al. 2013)... PASCAL-S (Li et al. 2014)... HKU-IS (Li and Yu 2015)... DUT-OMRON (Yang et al. 2013)... SOD (Movahedi and Elder 2010)... DUTS (Wang et al. 2017)... As with other works in salient object detection (Qin et al. 2019; Liu et al. 2019), we employ DUTS-TR as our training dataset and evaluate our model on other datasets.
Dataset Splits No The paper mentions using DUTS-TR as the training dataset but does not explicitly describe a separate validation dataset split or its specifics for hyperparameter tuning or model selection.
Hardware Specification Yes The inference of a 320 320 image takes about 0.02s (over 50 fps) with the acceleration of one NVIDIA Titan-Xp GPU card.
Software Dependencies No The paper states: "We use Pytorch (Paszke et al. 2017) to implement our model." but does not provide a specific version number for Pytorch or any other software dependencies.
Experiment Setup Yes In the training stage, we resize each image to 320 320 with random horizontal flipping, then randomly crop a patch with the size of 288 288 for training. During the inference stage, images are simply resized to 320 320 then fed into the network to obtain prediction without any other post-processing (e.g., CRF). We use Pytorch (Paszke et al. 2017) to implement our model. Mini-batch Stochastic gradient descent (SGD) is used to optimize the whole network with the batch size of 32, the momentum of 0.9, and the weight decay of 5e-4. We use the warm-up and linear decay strategies with the maximum learning rate 5e-3 for the backbone and 0.05 for other parts to train our model and stop training after 30 epochs.