InterControl: Zero-shot Human Interaction Generation by Controlling Every Joint
Authors: Zhenzhi Wang, Jingbo Wang, Yixuan Li, Dahua Lin, Bo Dai
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results highlight the capability of our framework to generate interactions with multiple human characters and its potential to work with off-the-shelf physics-based character simulators. Code is available at https://github.com/zhenzhiwang/intercontrol. Extensive experiments in Human ML3D [14] and KIT-ML [47] datasets quantitatively validates our joint control ability, and the user study on generated interactions shows a clear preference over previous methods. |
| Researcher Affiliation | Collaboration | Zhenzhi Wang1, Jingbo Wang2, Yixuan Li1, Dahua Lin1,2, Bo Dai3,2 1The Chinese University of Hong Kong, 2Shanghai Artificial Intelligence Laboratory, 3The University of Hong Kong |
| Pseudocode | Yes | Algorithm 1 Two-people interaction model inference |
| Open Source Code | Yes | Code is available at https://github.com/zhenzhiwang/intercontrol. |
| Open Datasets | Yes | Datasets. We conduct experiments on Human ML3D [14] and KIT-ML [47] following MDM [55]. |
| Dataset Splits | Yes | Datasets. We conduct experiments on Human ML3D [14] and KIT-ML [47] following MDM [55]. |
| Hardware Specification | Yes | Inference time analysis on a NVIDIA A100 GPU. |
| Software Dependencies | No | The paper mentions 'Python scripts' and 'Py Torch-like code' but does not specify their version numbers or the versions of other major libraries like PyTorch itself. It refers to specific models/optimizers by their original paper citations (e.g., Adam W [39], CLIP [48], GPT-4 [43], L-BFGS [37]), but these are not software dependency versions in the typical sense. |
| Experiment Setup | Yes | We run L-BFGS [37] in IK guidance 5 times for the first 990 denoising steps and 10 times for the last 10 denoising steps on the posterior mean µt; and once for the first 990 steps and 10 times for the last 10 steps on clean motion x0. We use Adam W [39] optimizer and set the learning rate as 1e-5. |