InterControl: Zero-shot Human Interaction Generation by Controlling Every Joint

Authors: Zhenzhi Wang, Jingbo Wang, Yixuan Li, Dahua Lin, Bo Dai

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results highlight the capability of our framework to generate interactions with multiple human characters and its potential to work with off-the-shelf physics-based character simulators. Code is available at https://github.com/zhenzhiwang/intercontrol. Extensive experiments in Human ML3D [14] and KIT-ML [47] datasets quantitatively validates our joint control ability, and the user study on generated interactions shows a clear preference over previous methods.
Researcher Affiliation Collaboration Zhenzhi Wang1, Jingbo Wang2, Yixuan Li1, Dahua Lin1,2, Bo Dai3,2 1The Chinese University of Hong Kong, 2Shanghai Artificial Intelligence Laboratory, 3The University of Hong Kong
Pseudocode Yes Algorithm 1 Two-people interaction model inference
Open Source Code Yes Code is available at https://github.com/zhenzhiwang/intercontrol.
Open Datasets Yes Datasets. We conduct experiments on Human ML3D [14] and KIT-ML [47] following MDM [55].
Dataset Splits Yes Datasets. We conduct experiments on Human ML3D [14] and KIT-ML [47] following MDM [55].
Hardware Specification Yes Inference time analysis on a NVIDIA A100 GPU.
Software Dependencies No The paper mentions 'Python scripts' and 'Py Torch-like code' but does not specify their version numbers or the versions of other major libraries like PyTorch itself. It refers to specific models/optimizers by their original paper citations (e.g., Adam W [39], CLIP [48], GPT-4 [43], L-BFGS [37]), but these are not software dependency versions in the typical sense.
Experiment Setup Yes We run L-BFGS [37] in IK guidance 5 times for the first 990 denoising steps and 10 times for the last 10 denoising steps on the posterior mean µt; and once for the first 990 steps and 10 times for the last 10 steps on clean motion x0. We use Adam W [39] optimizer and set the learning rate as 1e-5.