RGBD Based Gaze Estimation via Multi-Task CNN

Authors: Dongze Lian, Ziheng Zhang, Weixin Luo, Lina Hu, Minye Wu, Zechao Li, Jingyi Yu, Shenghua Gao2488-2495

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experiments demonstrate that our method outperforms existing methods by a large margin on both our dataset and the EYEDIAP dataset. We evaluate our method on both our RGBD gaze dataset and EYEDIAP. The experimental results of different methods are listed in Table 2 and Table 3. Ablation Studies
Researcher Affiliation Academia 1Shanghai Tech University, 2Nanjing University of Science and Technology
Pseudocode No The paper describes network architectures with diagrams but does not include any pseudocode or algorithm blocks.
Open Source Code No The paper states, 'Our dataset will facilitate the study of data-driven-based approaches for gaze tracking, and we will release our dataset to the community in the future.' but makes no explicit mention or provides any link for the release of source code for the methodology described.
Open Datasets Yes Mora et al. (Mora, Monay, and Odobez 2014) have built a dataset based on RGBD by using a Kinect camera, but the participants were too few (only 16 participants). Krafka et al. (Krafka et al. 2016) showed that more participants would boost the performance of person-independent gaze tracking. Thus we build a large RGBD gaze dataset with 218 participants and 165,231 images. Our dataset will facilitate the study of data-driven-based approaches for gaze tracking, and we will release our dataset to the community in the future. We evaluate our method on both our RGBD gaze dataset and EYEDIAP. We follow the same strategy as (Zhang et al. 2017) to choose frame images and gaze points. After that, we divide the 14 participants into 5 groups and perform cross-validation.
Dataset Splits Yes We further use the images corresponding to 159 participants (119,318 RGB/depth image pairs) as training data and use the data corresponding to the remaining 59 participants as test data (45,913 RGB/depth image pairs). For gaze direction estimation on the EYEDIAP dataset...we divide the 14 participants into 5 groups and perform cross-validation.
Hardware Specification Yes We use 8 NVIDIA Tesla k40m GPUs to train our network.
Software Dependencies No The paper states, 'We implement our method with the Py Torch (Paszke et al. 2017) framework.' While PyTorch is mentioned, a specific version number is not provided, nor are versions for other key software components.
Experiment Setup Yes The batch size for all of our experiments is 100 for training and 200 for testing. We use 8 NVIDIA Tesla k40m GPUs to train our network. Stochastic Gradient Descent (SGD) optimization algorithm is adopted to train our network. We first separately pretrain the GAN and gaze estimation networks, and then we finetune the overall network to get optimal estimation performance with multi-task learning.