FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing
Authors: Mingyuan Zhang, Huirong Li, Zhongang Cai, Jiawei Ren, Lei Yang, Ziwei Liu
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To thoroughly evaluate our proposed algorithm, we have conducted extensive experimental analysis on existing standard datasets such as Human ML3D [6], KIT-ML [12], and BABEL [13]. ... Experimental results show that our method has achieved state-of-the-art levels on these benchmarks. ... 4 Experiments ... 4.4 Quantitative Results ... 4.6 Ablation Study |
| Researcher Affiliation | Collaboration | 1 S-Lab, Nanyang Technological University, 2 Sense Time Research |
| Pseudocode | No | The paper does not contain any sections or figures explicitly labeled 'Pseudocode' or 'Algorithm', nor does it present structured steps in a code-like format. |
| Open Source Code | No | The paper provides a 'Project Page: https://mingyuan-zhang.github.io/projects/Fine Mo Gen.html' but does not explicitly state that the source code for the methodology is available there or at a specific code repository. It does not contain an unambiguous statement of code release. |
| Open Datasets | Yes | To thoroughly evaluate our proposed algorithm, we have conducted extensive experimental analysis on existing standard datasets such as Human ML3D [6], KIT-ML [12], and BABEL [13]. ... We selected 2,968 videos from the 160 types of actions in the Hu MMan dataset to be annotated in detail. |
| Dataset Splits | No | The paper mentions training and inference phases and refers to 'test set' for evaluation, but does not provide specific percentages, sample counts, or explicit methodology for training, validation, and test splits (e.g., '80/10/10 split', 'random split with seed 42'). |
| Hardware Specification | Yes | Training is performed using one Tesla V100, with the batch size on a single GPU set at 128. |
| Software Dependencies | No | The paper mentions using a 'pre-trained CLIP model [14]' and 'CLIP Vi T-B/32' but does not specify software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, specific library versions). |
| Experiment Setup | Yes | Regarding the motion encoder, we utilize a 4-layer transformer, with a latent dimension of 7 * 64. ... In terms of the diffusion model, the variances, denoted as βt, are predefined to linearly spread from 0.0001 to 0.02, with the total number of noising steps set as T = 1000. We use the Adam optimizer to train the model, initially setting the learning rate to 0.0002. This learning rate will gradually decay to 0.00002 in accordance with a cosine learning rate scheduler. Training is performed using one Tesla V100, with the batch size on a single GPU set at 128. |