Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing
Authors: Mingyuan Zhang, Huirong Li, Zhongang Cai, Jiawei Ren, Lei Yang, Ziwei Liu
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To thoroughly evaluate our proposed algorithm, we have conducted extensive experimental analysis on existing standard datasets such as Human ML3D [6], KIT-ML [12], and BABEL [13]. ... Experimental results show that our method has achieved state-of-the-art levels on these benchmarks. ... 4 Experiments ... 4.4 Quantitative Results ... 4.6 Ablation Study |
| Researcher Affiliation | Collaboration | 1 S-Lab, Nanyang Technological University, 2 Sense Time Research |
| Pseudocode | No | The paper does not contain any sections or figures explicitly labeled 'Pseudocode' or 'Algorithm', nor does it present structured steps in a code-like format. |
| Open Source Code | No | The paper provides a 'Project Page: https://mingyuan-zhang.github.io/projects/Fine Mo Gen.html' but does not explicitly state that the source code for the methodology is available there or at a specific code repository. It does not contain an unambiguous statement of code release. |
| Open Datasets | Yes | To thoroughly evaluate our proposed algorithm, we have conducted extensive experimental analysis on existing standard datasets such as Human ML3D [6], KIT-ML [12], and BABEL [13]. ... We selected 2,968 videos from the 160 types of actions in the Hu MMan dataset to be annotated in detail. |
| Dataset Splits | No | The paper mentions training and inference phases and refers to 'test set' for evaluation, but does not provide specific percentages, sample counts, or explicit methodology for training, validation, and test splits (e.g., '80/10/10 split', 'random split with seed 42'). |
| Hardware Specification | Yes | Training is performed using one Tesla V100, with the batch size on a single GPU set at 128. |
| Software Dependencies | No | The paper mentions using a 'pre-trained CLIP model [14]' and 'CLIP Vi T-B/32' but does not specify software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, specific library versions). |
| Experiment Setup | Yes | Regarding the motion encoder, we utilize a 4-layer transformer, with a latent dimension of 7 * 64. ... In terms of the diffusion model, the variances, denoted as Îēt, are predefined to linearly spread from 0.0001 to 0.02, with the total number of noising steps set as T = 1000. We use the Adam optimizer to train the model, initially setting the learning rate to 0.0002. This learning rate will gradually decay to 0.00002 in accordance with a cosine learning rate scheduler. Training is performed using one Tesla V100, with the batch size on a single GPU set at 128. |