Gaze Target Detection by Merging Human Attention and Activity Cues
Authors: Yaokun Yang, Yihan Yin, Feng Lu
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our approach attains state-of-the-art performance on both the Gazefollow benchmark and the Gaze Video Attn benchmark. |
| Researcher Affiliation | Academia | State Key Laboratory of VR Technology and Systems, School of CSE, Beihang University {yangyaokun, yyhppx, lufeng}@buaa.edu.cn |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | Yes | This paper employs two well-established datasets for gaze target detection, namely Gaze Follow (Recasens et al. 2015) and Video Attention Target (Chong et al. 2020). |
| Dataset Splits | No | After partitioning, 4,782 annotated individuals are designated for testing, with the remainder allocated for training. |
| Hardware Specification | Yes | During training, we employ a mini-batch size of 32 on a single NVIDIA Titan Xp GPU, initializing with a learning rate of 0.0001. [...] In order to measure their computation complexity, we also select recent high-speed implementations for them, and compared their inference speed on a single NVIDIA Titan XP GPU. |
| Software Dependencies | No | Our implementation is carried out using the Py Torch framework. The paper does not provide specific version numbers for software dependencies. |
| Experiment Setup | Yes | Our implementation is carried out using the Py Torch framework. We utilize Res Net-50 (He et al. 2016) as our scene feature extractor. All input scene images are resized to dimensions of 224 224, while our input face image is resized to 64 64. During training, we employ a mini-batch size of 32 on a single NVIDIA Titan Xp GPU, initializing with a learning rate of 0.0001. Our training regimen spans 90 epochs on the Gaze Follow dataset, with learning rate adjustments at the 80th and 90th epochs, involving a multiplication by 0.1. Our entire training process takes approximately 18 hours. As our optimizer, we rely on the Adam algorithm (Kingma and Ba 2014), with an Adam weight decay set at 0.0001 and an Adam momentum of 0.9. |