NeRM: Learning Neural Representations for High-Framerate Human Motion Synthesis

Authors: Dong Wei, Huaijiang Sun, Bin Li, Xiaoning Sun, Shengxiang Hu, Weiqing Li, Jianfeng Lu

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct comprehensive experiments on various datasets, including Human ML3D (Guo et al., 2022), KIT (Plappert et al., 2016), Human Act12 (Guo et al., 2020) and UESTC (Ji et al., 2018). Numerical results demonstrate Ne RM to be extremely competitive with state-of-the-art baselines.
Researcher Affiliation Collaboration 1School of Computer Science and Engineering, Nanjing University of Science and Technology 2Tianjin Ai Forward Science and Technology Co., Ltd.
Pseudocode Yes Figure 6: Detailed network architecture of Codebook-Coordinate Attention (CCA).
Open Source Code No The paper does not explicitly state that source code for the methodology is provided, nor does it include a link to a code repository.
Open Datasets Yes We conduct comprehensive experiments on various datasets, including Human ML3D (Guo et al., 2022), KIT (Plappert et al., 2016), Human Act12 (Guo et al., 2020) and UESTC (Ji et al., 2018).
Dataset Splits No The paper mentions training, validation, and test sets conceptually but does not provide specific split percentages or sample counts for dataset partitioning (e.g., "80/10/10 split" or "X training samples, Y validation samples, Z test samples").
Hardware Specification Yes We train our models under Pytorch on NVIDIA Ge Force RTX 3090.
Software Dependencies Yes We employ a frozen CLIP-Vi T-L-14 model as our text encoder for text prompt... We train our models under Pytorch on NVIDIA Ge Force RTX 3090.
Experiment Setup Yes For INR, the hidden layer size is fixed to 1,024... The codebook size is set to 512 × 512... The number of learnable query embeddings of codebook-coordinate attention is 256, and the dimension of each embedding is 128... The shape of latent codes z is set to 256... Our models are trained with the Adam W optimizer using a fixed learning rate of 10−4. Our batch size is set to 4,096 during the INR training stage and 64 during the diffusion training stage separately... INR model was trained for 20,000 epochs and diffusion model was trained for 3,000 epochs. The number of diffusion steps is 1,000 during training while 50 during inference. The corresponding variances βk in diffusion are scaled linearly from 8.5 × 10−4 to 0.012.