TapMo: Shape-aware Motion Generation of Skeleton-free Characters
Authors: Jiaxu Zhang, Shaoli Huang, Zhigang Tu, Xin Chen, Xiaohang Zhan, Gang YU, Ying Shan
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness and generalizability of Tap Mo through rigorous qualitative and quantitative experiments. Our results reveal that Tap Mo consistently outperforms existing auto-animation methods, delivering superior-quality animations for both seen or unseen heterogeneous 3D characters. |
| Researcher Affiliation | Collaboration | Jiaxu Zhang1,2 , Shaoli Huang2 , Zhigang Tu1 , Xin Chen3, Xiaohang Zhan2, Gang Yu3, Ying Shan2 1Wuhan University, 2Tencent AI Lab, 3Tencent PCG |
| Pseudocode | No | The paper describes the model architecture and training process with text and mathematical equations (e.g., Eq. 1-14) and figures, but it does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | No | The paper states 'The project page: https://semanticdh.github.io/Tap Mo.' but does not explicitly say that the source code for the methodology is available there, nor does it provide a direct link to a code repository. |
| Open Datasets | Yes | Four public datasets are used to train and evaluate our Tap Mo, i.e., AMASS (Mahmood et al., 2019), Mixamo (Adobe), Models Resource-Rig Net (Xu et al., 2019), and Human ML3D (Guo et al., 2022). |
| Dataset Splits | Yes | We follow the train-test split protocol in Xu et al. (2019). We convert these SMPL motions to our ground-truth motions using the analytical method introduced in Besl & Mc Kay (1992) and follow the train-test split protocol in Guo et al. (2020). |
| Hardware Specification | No | The paper does not provide any specific details regarding the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper states 'We implement our pipeline using Py Torch framework(Paszke et al., 2019)' but does not provide specific version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | In training, the motion sequence length N is padded to 196 frames, and the text length is padded to 30 words. The text feature and the mesh feature are concatenated to form the condition token, with a dimension of 31 512. Padding masks for the motion sequence and text are utilized during training to prevent mode collapse. The loss balancing factors νr, νp, νh, and νa are set as 0.1, 1.0, 0.001, and 0.1, respectively. We use an Adam optimizer with a learning rate of 1e-4 to train the Handle Predictor. The batch size is 4, and the training epoch is 500. To train the Diffusion Model, we also use an Adam optimizer with a learning rate of 1e-4. The batch size is 32 and the training step is 800,000. The finetune step is set as 100,000. |