reproducibilityindex.ai

Gaze Target Detection by Merging Human Attention and Activity Cues

Authors: Yaokun Yang, Yihan Yin, Feng Lu

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our approach attains state-of-the-art performance on both the Gazefollow benchmark and the Gaze Video Attn benchmark.
Researcher Affiliation	Academia	State Key Laboratory of VR Technology and Systems, School of CSE, Beihang University {yangyaokun, yyhppx, lufeng}@buaa.edu.cn
Pseudocode	No	No structured pseudocode or algorithm blocks were found.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described.
Open Datasets	Yes	This paper employs two well-established datasets for gaze target detection, namely Gaze Follow (Recasens et al. 2015) and Video Attention Target (Chong et al. 2020).
Dataset Splits	No	After partitioning, 4,782 annotated individuals are designated for testing, with the remainder allocated for training.
Hardware Specification	Yes	During training, we employ a mini-batch size of 32 on a single NVIDIA Titan Xp GPU, initializing with a learning rate of 0.0001. [...] In order to measure their computation complexity, we also select recent high-speed implementations for them, and compared their inference speed on a single NVIDIA Titan XP GPU.
Software Dependencies	No	Our implementation is carried out using the Py Torch framework. The paper does not provide specific version numbers for software dependencies.
Experiment Setup	Yes	Our implementation is carried out using the Py Torch framework. We utilize Res Net-50 (He et al. 2016) as our scene feature extractor. All input scene images are resized to dimensions of 224 224, while our input face image is resized to 64 64. During training, we employ a mini-batch size of 32 on a single NVIDIA Titan Xp GPU, initializing with a learning rate of 0.0001. Our training regimen spans 90 epochs on the Gaze Follow dataset, with learning rate adjustments at the 80th and 90th epochs, involving a multiplication by 0.1. Our entire training process takes approximately 18 hours. As our optimizer, we rely on the Adam algorithm (Kingma and Ba 2014), with an Adam weight decay set at 0.0001 and an Adam momentum of 0.9.