Energy-Based Generative Cooperative Saliency Prediction

Authors: Jing Zhang, Jianwen Xie, Zilong Zheng, Nick Barnes3280-3290

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that our model can produce a set of diverse and plausible saliency maps of an image, and obtain state-of-the-art performance in both fully supervised and weakly supervised saliency prediction tasks. Experiments We conduct a series of experiments to test the performances of the proposed generative cooperative frameworks for saliency prediction.
Researcher Affiliation Collaboration Jing Zhang1, Jianwen Xie2, Zilong Zheng3, Nick Barnes1 1 The Australian National University 2 Cognitive Computing Lab, Baidu Research 3 University of California, Los Angeles
Pseudocode Yes Algorithm 1: Training the Cooperative Saliency Predictor. Algorithm 2: Cooperative learning while recovering.
Open Source Code No The paper states 'To demonstrate this idea, we select BASN (Qin et al. 2019) and SCRN (Wu, Su, and Huang 2019b) as base models due to the accessibility of their codes and predictions.', referring to other models' code, but does not provide any statement or link about the availability of their own open-source code for the described methodology.
Open Datasets Yes We use the DUTS dataset (Wang et al. 2017) to train the fully supervised model, and S-DUTS (Zhang et al. 2020b) dataset with scribble annotations to train the weakly supervised model.
Dataset Splits No The paper mentions 'training images' and 'training dataset' but does not specify the exact training/validation/test dataset splits (e.g., percentages, absolute counts, or predefined splits with citations).
Hardware Specification Yes It takes 20 hours to train the model with a batch size of 7 using a single NVIDIA Ge Force RTX 2080Ti GPU.
Software Dependencies No The paper mentions software components like 'Adam optimizer' and uses architectures like 'ResNet50' and 'MiDaS decoder', but does not provide specific version numbers for key software dependencies (e.g., Python, PyTorch/TensorFlow, CUDA versions).
Experiment Setup Yes The number of Langevin steps is K = 5 and the Langevin step sizes for EBM and LVM are 0.4 and 0.1. The learning rates of the LVM and EBM are initialized to 5 × 10−5 and 10−3 respectively. We use Adam optimizer with momentum 0.9 and decrease the learning rates by 10% after every 20 epochs. It takes 20 hours to train the model with a batch size of 7 using a single NVIDIA Ge Force RTX 2080Ti GPU.