HUMANISE: Language-conditioned Human Motion Generation in 3D Scenes
Authors: Zan Wang, Yixin Chen, Tengyu Liu, Yixin Zhu, Wei Liang, Siyuan Huang
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate that our model generates diverse and semantically consistent human motions in 3D scenes; it outperforms the baselines on various evaluation metrics. We benchmark our proposed task language-conditioned human motion generation in 3D scenes on HUMANISE and describe the detailed settings, baselines, analyses, and ablative studies. |
| Researcher Affiliation | Collaboration | 1 School of Computer Science & Technology, Beijing Institute of Technology 2 Beijing Institute for General Artificial Intelligence (BIGAI) 3 Institute for Artificial Intelligence, Peking University 4 Yangtze Delta Region Academy of Beijing Institute of Technology, Jiaxing |
| Pseudocode | No | The paper describes the model architecture and training process in text and diagrams (Fig. 3) but does not provide formal pseudocode or algorithm blocks. |
| Open Source Code | No | The paper provides a project website link (https://silverster98.github.io/HUMANISE/) but does not explicitly state that source code for the methodology is released or provide a direct link to a code repository. |
| Open Datasets | Yes | To tackles the above issues, we propose a large-scale and semantic-rich synthetic HSI dataset, HUMANISE (see Fig. 1), by aligning the captured human motion sequences [Mahmood et al., 2019] with the scanned indoor scenes [Dai et al., 2017]. |
| Dataset Splits | No | We split motions in HUMANISE according to the original scene IDs and split in Scan Net [Dai et al., 2017], resulting in 16.5k motions in 543 scenes for training and 3.1k motions in 100 scenes for testing. |
| Hardware Specification | Yes | We train our model with a batch size of 32 on a V100 GPU. |
| Software Dependencies | No | The paper mentions using Adam and pre-trained BERT, but does not provide specific version numbers for these software components or other libraries. |
| Experiment Setup | Yes | We train our generative model on HUMANISE for 150 epochs using Adam [Kingma and Ba, 2014] and a fixed learning rate of 0.0001. For hyper-parameters, we empirically set αkl = αo = 0.1, αa = 0.5, αr = 1.0, and αp = αv = 10.0. We set the dimension of global condition latent zc to 512 and latent z to 256. The hidden state size is set to 256 in the single-layer bidirectional GRU motion encoder. The transformer motion decoder contains two standard layers with the 512 hidden state size. We train our model with a batch size of 32 on a V100 GPU. |