TransGOP: Transformer-Based Gaze Object Prediction
Authors: Binglu Wang, Chenxi Guo, Yang Jin, Haisheng Xia, Nian Liu
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on the GOOSynth and GOO-Real datasets demonstrate that our Trans GOP achieves state-of-the-art performance on all tracks, i.e., object detection, gaze estimation, and gaze object prediction. |
| Researcher Affiliation | Academia | 1Xi an University of Architecture and Technology 2Beijing Institute of Technology 3University of Science and Technology of China 4Mohamed bin Zayed University of Artificial Intelligence {wbl921129, guochenxix, jin91999}@gmail.com, hsxia@ustc.edu.cn, liunian228@gmail.com |
| Pseudocode | No | The paper provides architectural diagrams and descriptions of its components but does not include formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code will be available at https://github.com/chenxi Guo/Trans GOP.git. |
| Open Datasets | Yes | All experiments were conducted on GOO-Synth and GOO-Real datasets (Tomas et al. 2021). ...Tomas et al. (Tomas et al. 2021) ...introduced the first dataset, the GOO dataset... |
| Dataset Splits | No | The paper mentions training on GOO-Synth and GOO-Real datasets and discusses evaluation metrics, but it does not explicitly provide specific percentages or counts for training, validation, or test splits. It implies standard splits are used for the cited datasets without detailing them. |
| Hardware Specification | Yes | All experiments are implemented based on the Py Torch and one Ge Force RTX 3090Ti GPU. |
| Software Dependencies | No | The paper mentions 'Py Torch' but does not specify its version number or any other software dependencies with version numbers. |
| Experiment Setup | Yes | Trans GOP is trained for 50 epochs, with an initial learning rate of 1e 4, and the learning rate is reduced by 0.94 times every 5 epochs. We use Adam W as our optimizer. For the gaze autoencoder, we set the hidden size to 256 and employ 200 decoder queries. In Eq.1, the loss weight α is 1000 and β is 10. The input size of the image is set to 224 224 and the predicted heatmap size is 64 64. |