OmniControl: Control Any Joint at Any Time for Human Motion Generation

Authors: Yiming Xie, Varun Jampani, Lei Zhong, Deqing Sun, Huaizu Jiang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on Human ML3D and KIT-ML datasets show that Omni Control not only achieves significant improvement over state-of-the-art methods on pelvis control but also shows promising results when incorporating the constraints over other joints.
Researcher Affiliation Collaboration Yiming Xie1, Varun Jampani2, Lei Zhong1, Deqing Sun3, Huaizu Jiang1 1Northeastern University 2Stability AI 3Google Research
Pseudocode Yes Algorithm 1 Omni Control s inference
Open Source Code No The paper provides a 'Project page: https://neu-vi.github.io/omnicontrol/'. However, it does not explicitly state that the source code for the methodology is released at this link or in supplementary materials within the paper's text.
Open Datasets Yes We experiment on the popular Human ML3D (Guo et al., 2022a) dataset which contains 14,646 text-annotate human motion sequences from AMASS (Mahmood et al., 2019) and Human Act12 (Guo et al., 2020) datasets. We also evaluate our method on the KIT-ML (Plappert et al., 2016) dataset with 3,911 sequences.
Dataset Splits No The paper mentions evaluating on 'Human ML3D test set' and 'KIT-ML test set' but does not specify the train/validation/test splits or percentages for the datasets used to train the models.
Hardware Specification Yes We implemented our model using Pytorch with training on 1 NVIDIA A5000 GPU.
Software Dependencies No The paper mentions using 'Pytorch', 'Adam W optimizer', 'CLIP model', and 'DDPM' but does not specify version numbers for any of these software components.
Experiment Setup Yes Batch size b = 64. We use Adam W optimizer (Loshchilov & Hutter, 2017), and the learning rate is 1e 5. It takes 29 hours to train on a single A5000 GPU with 250,000 iterations in total. ...We utilize DDPM (Ho et al., 2020) with T = 1000 denoising steps. The control strength τ = 20ˆΣt V , where V is the number of frames we want to control (density) and ˆΣt = min(Σt, 0.01). We use Ke = 10, Kl = 500, and Ts = 10 in our experiments.