Test-Time Personalization with Meta Prompt for Gaze Estimation

Authors: Huan Liu, Julia Qi, Zhenhao Li, Mohammad Hassanpour, Yang Wang, Konstantinos N. Plataniotis, Yuanhao Yu

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show high efficiency of the prompt tuning approach. The proposed one can be 10 times faster in terms of adaptation speed than the methods compared. Our experiments show that the meta-learned prompt can be effectively adapted even with a simple symmetry loss. In addition, we experiment on four cross-dataset validations to show the remarkable advantages of the proposed method. 6 Experiments 6.1 Dataset 6.2 Implementation Details 6.3 Comparison with the SOTA 6.4 Ablation Study 6.5 Additional Analysis
Researcher Affiliation Collaboration Huan Liu1, Julia Qi1,2* , Zhenhao Li1*, Mohammad Hassanpour1, Yang Wang3, Konstantinos Plataniotis4, Yuanhao Yu1 1 Noah s Ark Lab, Huawei, Canada 2 University of Waterloo, Canada 3 Department of Computer Science and Software Engineering, Concordia University, Canada 4 The Edward S. Rogers Sr. Department of Electrical and Computer Engineering, University of Toronto, Canada {huan.liu127, zhenhao.li1, mohammad.hassanpour, yuanhao.yu}@huawei.com, j6qi@waterloo.ca, yang.wang@concordia.ca, kostas@ece.utoronto.ca
Pseudocode Yes Algorithm 1: Training of Meta Prompt
Open Source Code No The paper does not provide any explicit statement or link regarding the public availability of its source code.
Open Datasets Yes We employ four gaze estimation datasets as four different domains, namely ETH-XGaze (DE) (Zhang et al. 2020), Gaze360 (DG) (Kellnhofer et al. 2019), MPIIGaze (DM) (Zhang et al. 2017b), and Eye Diap (DD) (Funes Mora, Monay, and Odobez 2014).
Dataset Splits Yes DE and DG are used as source domains whereas DM and DD are used as target domains, in alignment with RUDA (Bao et al. 2022) and Pn P-GA (Liu et al. 2021b). To perform personalization for solving Equation 3, we use only 5 images per person. To ensure the reproducibility of results, we do not perform random sampling but use the first 5 images of each person.
Hardware Specification Yes Our method is implemented using Py Torch library (Paszke et al. 2019) and conducted on NVIDIA Tesla V100 GPUs.
Software Dependencies No Our method is implemented using Py Torch library (Paszke et al. 2019). Although PyTorch is mentioned, a specific version number is not provided in the text.
Experiment Setup Yes Our method is implemented using Py Torch library (Paszke et al. 2019) and conducted on NVIDIA Tesla V100 GPUs. We use Adam (Kingma and Ba 2014) as our optimizer with β = (0.5, 0.95). The training images are all cropped to a size of 224 224 without data augmentation. Pre-training stage. During network pre-training, we use L1 loss and symmetry loss to train the network fθ with a minibatch size of 120. The initial learning rate is set to 10 4. We train for 50 epochs with the learning rate multiplied by 0.1 at Epoch 25. Meta-training stage. During the meta-training stage for prompt initialization, the network is initialized with weights obtained from the pre-training stage. Prompts are initialized randomly using a Gaussian distribution with mean 0 and variance 1. Note that if not explicitly specified, we replace the padding of the first nine convolutional layers in Res Net18 (He et al. 2016). All other parameters are kept frozen. We use a mini-batch size of 20. The learning rates, i.e., λ1 and λ2 in Algorithm 1, are set to 10 4. The meta-training process continues for 1000 iterations. Personalization stage. To perform personalization for solving Equation 3, we use only 5 images per person... During personalization, only the prompt is optimized, and all other parameters are fixed. The learning rate is set to 0.01.