RGB-D Salient Object Detection via 3D Convolutional Neural Networks

Authors: Qian Chen, Ze Liu, Yi Zhang, Keren Fu, Qijun Zhao, Hongwei Du1063-1071

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on six widely used benchmark datasets demonstrate that RD3D performs favorably against 14 state-of-the-art RGB-D SOD approaches in terms of four key evaluation metrics. Our code will be made publicly available: https://github.com/PPOLYpubki/RD3D.
Researcher Affiliation Academia 1School of Information Science and Technology, University of Science and Technology of China 2Institut National des Sciences Appliquees de Rennes 3College of Computer Science, Sichuan University 4National Key Laboratory of Fundamental Science on Synthetic Vision, Sichuan University
Pseudocode No The paper describes the methodology using text, block diagrams (Figure 2), and mathematical formulations, but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes Our code will be made publicly available: https://github.com/PPOLYpubki/RD3D.
Open Datasets Yes We evaluate our RD3D on six popular public datasets having paired RGB and depth images, including: NJU2K (1,985 pairs) (Ju et al. 2014)), NLPR (1,000 pairs) (Peng et al. 2014), STERE (1,000 pairs) (Niu et al. 2012), DES (135 pairs, also called the RGBD135 dataset in some previous works) (Cheng et al. 2014), SIP (929 pairs) (Fan et al. 2020a) and DUTLF-D (1,200 pairs) (Piao et al. 2019).
Dataset Splits Yes Following (Chen and Li 2018; Chen, Li, and Su 2019; Han et al. 2017), we use the same 1,485 pairs from NJU2K and 700 pairs from NLPR for training. The remaining pairs are used for testing. Specially, on the latest DUTLF-D dataset, we follow (Piao et al. 2019; Zhao et al. 2020; Piao et al. 2020; Li et al. 2020; Ji et al. 2020) to add additional 800 pairs from DUTLF-D for training and test on the remaining 400 pairs.
Hardware Specification Yes Our framework is implemented based on Py Torch (Paszke et al. 2019) on a workstation with 4 NVIDIA 1080Ti GPUs.
Software Dependencies No Our framework is implemented based on Py Torch (Paszke et al. 2019). The paper mentions PyTorch but does not specify a version number or other software dependencies with their versions.
Experiment Setup Yes During training, we adopt the Adam optimizer with an initial learning rate of 0.0001, which is decayed by a cosine learning rate scheduler. The weight decay is set to 0.001. The data is first resized to [352, 352] and then augmented by random horizontal flip and multi-scale transformation with the scale of {256, 352, 416}. We train for 100 epochs on 4 GPUs with the batch size equals to 10 per GPU, and the total training time is about 6 hours. The model after the last epoch is used for inference. Regarding the supervision, we calculate the typical binary cross-entropy loss. During testing, an image of arbitrary size is first resized to [352, 352] and the predicted saliency map is resized back to its original size.