FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing

Authors: Mingyuan Zhang, Huirong Li, Zhongang Cai, Jiawei Ren, Lei Yang, Ziwei Liu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To thoroughly evaluate our proposed algorithm, we have conducted extensive experimental analysis on existing standard datasets such as Human ML3D [6], KIT-ML [12], and BABEL [13]. ... Experimental results show that our method has achieved state-of-the-art levels on these benchmarks. ... 4 Experiments ... 4.4 Quantitative Results ... 4.6 Ablation Study
Researcher Affiliation Collaboration 1 S-Lab, Nanyang Technological University, 2 Sense Time Research
Pseudocode No The paper does not contain any sections or figures explicitly labeled 'Pseudocode' or 'Algorithm', nor does it present structured steps in a code-like format.
Open Source Code No The paper provides a 'Project Page: https://mingyuan-zhang.github.io/projects/Fine Mo Gen.html' but does not explicitly state that the source code for the methodology is available there or at a specific code repository. It does not contain an unambiguous statement of code release.
Open Datasets Yes To thoroughly evaluate our proposed algorithm, we have conducted extensive experimental analysis on existing standard datasets such as Human ML3D [6], KIT-ML [12], and BABEL [13]. ... We selected 2,968 videos from the 160 types of actions in the Hu MMan dataset to be annotated in detail.
Dataset Splits No The paper mentions training and inference phases and refers to 'test set' for evaluation, but does not provide specific percentages, sample counts, or explicit methodology for training, validation, and test splits (e.g., '80/10/10 split', 'random split with seed 42').
Hardware Specification Yes Training is performed using one Tesla V100, with the batch size on a single GPU set at 128.
Software Dependencies No The paper mentions using a 'pre-trained CLIP model [14]' and 'CLIP Vi T-B/32' but does not specify software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, specific library versions).
Experiment Setup Yes Regarding the motion encoder, we utilize a 4-layer transformer, with a latent dimension of 7 * 64. ... In terms of the diffusion model, the variances, denoted as βt, are predefined to linearly spread from 0.0001 to 0.02, with the total number of noising steps set as T = 1000. We use the Adam optimizer to train the model, initially setting the learning rate to 0.0002. This learning rate will gradually decay to 0.00002 in accordance with a cosine learning rate scheduler. Training is performed using one Tesla V100, with the batch size on a single GPU set at 128.