GASP: Gated Attention for Saliency Prediction
Authors: Fares Abawi, Tom Weber, Stefan Wermter
IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments indicate that fusion approaches achieve better results for static integration methods, whereas non-fusion approaches for which the influence of each modality is unknown result in better outcomes when coupled with recurrent models for dynamic saliency prediction. We show that gaze direction and affective representations contribute a prediction to ground-truth correspondence improvement of at least 5% compared to dynamic saliency models without social cues. |
| Researcher Affiliation | Academia | Fares Abawi , Tom Weber and Stefan Wermter University of Hamburg {abawi, tomweber, wermter}@informatik.uni-hamburg.de |
| Pseudocode | Yes | Algorithm 1 SCD sampling and generation Input: Video and audio frames sampled from ds = AVE dataset Parameters: Window sizes WSP = 15, WGE = 7, WGF = 5, WFER = 0 O/P steps T SP = 15, T GE = 4, T GF = 0, T FER = 0 Output: Modality windows mdlwin O/P buffers bufmdl 1: for vid in ds do ... |
| Open Source Code | Yes | Contact Author 1Code: http://software.knowledge-technology.info#gasp |
| Open Datasets | Yes | We train our GASP model on the social event subset of AVE [Tavakoli et al., 2020]. |
| Dataset Splits | No | We train our GASP model on the social event subset of AVE [Tavakoli et al., 2020]. ... The models are evaluated on the test subset of social event videos in AVE. |
| Hardware Specification | Yes | An NVIDIA RTX 2080 Ti GPU with 11 GB VRAM and 128 GB RAM is used for training all static and sequential models. To extract spatiotemporal maps in the first stage (SCD), we employ an NVIDIA TITAN RTX GPU with 24 GB VRAM and 64 GB RAM to accommodate all social cue detectors simultaneously. |
| Software Dependencies | No | The paper mentions using the Adam optimizer, but it does not specify software dependencies like programming language versions (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow) with their specific version numbers, or other libraries. |
| Experiment Setup | Yes | We employ the loss functions introduced by Tsiami et al. [2020], assigning the loss weights λ1 = 0.1, λ2 = 2, and λ3 = 1 to cross-entropy, CC, and NSS losses respectively. The model is trained using the Adam optimizer, having a learning rate of 0.001, with β1 = 0.9 and β2 = 0.999. All models are trained for 10k iterations with a batch size of 4. ... The loss LDAM with a weight λDAM = 0.5 is computed for optimizing the inverted stream parameters. |