Multi-Scale Control Signal-Aware Transformer for Motion Synthesis without Phase
Authors: Lintao Wang, Kun Hu, Lei Bai, Yu Ding, Wanli Ouyang, Zhiyong Wang
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Both qualitative and quantitative experimental results on an existing biped locomotion dataset, which involves diverse types of motion transitions, demonstrate the effectiveness of our method. |
| Researcher Affiliation | Collaboration | 1School of Computer Science, The University of Sydney, Australia 2Shanghai AI Laboratory, China 3Netease Fuxi AI Lab, China |
| Pseudocode | No | The paper describes its methods using text and mathematical equations but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | We evaluate our proposed method on a public dataset (Holden, Komura, and Saito 2017) for a fair comparison with the state-of-the-art methods. |
| Dataset Splits | No | The paper mentions using 'around 4 million samples for training' but does not specify the splits for training, validation, or testing sets, nor does it provide percentages or counts for these splits. |
| Hardware Specification | Yes | The model was trained with 20 epochs, which took 50 hours on an NVIDIA GTX 1080Ti GPU. |
| Software Dependencies | Yes | The model was implemented by Py Torch 1.7.1 (Paszke et al. 2019) and trained with an Adam optimisier (Kingma and Ba 2014). |
| Experiment Setup | Yes | In total, K = 5 past frames with indices k1 = 1, k2 = 10, k3 = 20, k4 = 30 and k5 = 40 were selected as input to predict the motion of the i-th frame. ... Each of them consisted of three transformer-encoder layers using six self-attention heads of a dimension 186 and the the feed-forward layers were of a dimension 1024. A dropout rate of 0.1 was applied to the encoders. ... The motion prediction network was modelled as a three-layer MLP with a hidden dimension 512 and a dropout rate 0.3. ... λ for ℓ1 regularization was set to 0.01. The learning rate was set to 10 4 and the batch size was 32. The model was trained with 20 epochs |