Energy-Based Generative Cooperative Saliency Prediction
Authors: Jing Zhang, Jianwen Xie, Zilong Zheng, Nick Barnes3280-3290
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that our model can produce a set of diverse and plausible saliency maps of an image, and obtain state-of-the-art performance in both fully supervised and weakly supervised saliency prediction tasks. Experiments We conduct a series of experiments to test the performances of the proposed generative cooperative frameworks for saliency prediction. |
| Researcher Affiliation | Collaboration | Jing Zhang1, Jianwen Xie2, Zilong Zheng3, Nick Barnes1 1 The Australian National University 2 Cognitive Computing Lab, Baidu Research 3 University of California, Los Angeles |
| Pseudocode | Yes | Algorithm 1: Training the Cooperative Saliency Predictor. Algorithm 2: Cooperative learning while recovering. |
| Open Source Code | No | The paper states 'To demonstrate this idea, we select BASN (Qin et al. 2019) and SCRN (Wu, Su, and Huang 2019b) as base models due to the accessibility of their codes and predictions.', referring to other models' code, but does not provide any statement or link about the availability of their own open-source code for the described methodology. |
| Open Datasets | Yes | We use the DUTS dataset (Wang et al. 2017) to train the fully supervised model, and S-DUTS (Zhang et al. 2020b) dataset with scribble annotations to train the weakly supervised model. |
| Dataset Splits | No | The paper mentions 'training images' and 'training dataset' but does not specify the exact training/validation/test dataset splits (e.g., percentages, absolute counts, or predefined splits with citations). |
| Hardware Specification | Yes | It takes 20 hours to train the model with a batch size of 7 using a single NVIDIA Ge Force RTX 2080Ti GPU. |
| Software Dependencies | No | The paper mentions software components like 'Adam optimizer' and uses architectures like 'ResNet50' and 'MiDaS decoder', but does not provide specific version numbers for key software dependencies (e.g., Python, PyTorch/TensorFlow, CUDA versions). |
| Experiment Setup | Yes | The number of Langevin steps is K = 5 and the Langevin step sizes for EBM and LVM are 0.4 and 0.1. The learning rates of the LVM and EBM are initialized to 5 × 10−5 and 10−3 respectively. We use Adam optimizer with momentum 0.9 and decrease the learning rates by 10% after every 20 epochs. It takes 20 hours to train the model with a batch size of 7 using a single NVIDIA Ge Force RTX 2080Ti GPU. |