Meta-Auxiliary Learning for Adaptive Human Pose Prediction

Authors: Qiongjie Cui, Huaijiang Sun, Jianfeng Lu, Bin Li, Weiqing Li

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that the proposed approach achieves higher accuracy and more realistic visualization.
Researcher Affiliation Collaboration Qiongjie Cui1, Huaijiang Sun1*, Jianfeng Lu1, Bin Li2, Weiqing Li1 1Nanjing University of Science and Technology 2Tianjin Ai Forward Science and Technology Co., Ltd., China
Pseudocode Yes Algorithm 1: Meta-Auxiliary Training
Open Source Code No The paper does not contain any statement about releasing source code or a link to a code repository.
Open Datasets Yes Dataset-1: H3.6M (Ionescu et al. 2014) involves 15 action categories performed by 7 professional human subjects (S 1, S 5, S 6, S 7, S 8, S 9, S 11). Each pose is represented as a 17-joint skeleton (N = 17), and the sequences are downsampled to achieve 25 fps (Mao et al. 2019; Ma et al. 2022). Dataset-2: We also select 8 action categories from CMU Mo Cap. The pre-processing solution is consistent with the H3.6M dataset.
Dataset Splits Yes Experimental Setups. We use 3 alternative setups to analyze our model, as stated in Table 1. The prefix S indicates the subject, and C denotes the category. For fairness, we also apply the training/testing division in Table 1, but the hyperparameters remain unchanged, to re-train the baselines.
Hardware Specification No The paper does not explicitly state specific hardware specifications such as GPU or CPU models used for the experiments.
Software Dependencies No The paper mentions using the 'Adam optimizer' but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup Yes The shared parts consist of 9 residual blocks, created by combining the outputs of SS-RT and TS-RT, and having the channel Cin = Cout = 512. In addition, the task-specific portions of the Pri. and Aux.2 are an additional block to map the feature into the original dimension. By contrast, the Aux.1 is a binary classifier, where its separate parts comprise a flatten layer, and 4 FC layers with channel numbers 256, 128, 64, 1. Aux.1 takes a scrambled-order counterpart of the observation as the input, while for Aux.2, we randomly remove 20% of the joints from observations... we exploit the Adam optimizer to train our network, where the learning rate is initialized to 0.001, with a 0.98 decay every 2 epoch. The mini-batch size is 16. At the test-time adaptation, we fix the learning rate α = β = 2 × 10−5, and 6 gradient descents of Eq.11 are performed.