Gaze Target Detection by Merging Human Attention and Activity Cues

Authors: Yaokun Yang, Yihan Yin, Feng Lu

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our approach attains state-of-the-art performance on both the Gazefollow benchmark and the Gaze Video Attn benchmark.
Researcher Affiliation Academia State Key Laboratory of VR Technology and Systems, School of CSE, Beihang University {yangyaokun, yyhppx, lufeng}@buaa.edu.cn
Pseudocode No No structured pseudocode or algorithm blocks were found.
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets Yes This paper employs two well-established datasets for gaze target detection, namely Gaze Follow (Recasens et al. 2015) and Video Attention Target (Chong et al. 2020).
Dataset Splits No After partitioning, 4,782 annotated individuals are designated for testing, with the remainder allocated for training.
Hardware Specification Yes During training, we employ a mini-batch size of 32 on a single NVIDIA Titan Xp GPU, initializing with a learning rate of 0.0001. [...] In order to measure their computation complexity, we also select recent high-speed implementations for them, and compared their inference speed on a single NVIDIA Titan XP GPU.
Software Dependencies No Our implementation is carried out using the Py Torch framework. The paper does not provide specific version numbers for software dependencies.
Experiment Setup Yes Our implementation is carried out using the Py Torch framework. We utilize Res Net-50 (He et al. 2016) as our scene feature extractor. All input scene images are resized to dimensions of 224 224, while our input face image is resized to 64 64. During training, we employ a mini-batch size of 32 on a single NVIDIA Titan Xp GPU, initializing with a learning rate of 0.0001. Our training regimen spans 90 epochs on the Gaze Follow dataset, with learning rate adjustments at the 80th and 90th epochs, involving a multiplication by 0.1. Our entire training process takes approximately 18 hours. As our optimizer, we rely on the Adam algorithm (Kingma and Ba 2014), with an Adam weight decay set at 0.0001 and an Adam momentum of 0.9.