Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation
Authors: Yukang Cao, Liang Pan, Kai Han, Kwan-Yee K Wong, Ziwei Liu
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments with existing methods validate Avatar GO s superior generation and animation capabilities on a variety of human-object pairs and diverse poses. |
| Researcher Affiliation | Academia | 1S-Lab, Nanyang Technological University, 2Shanghai AI Laboratory, 3The University of Hong Kong |
| Pseudocode | No | The paper describes its methodology in Section 3, including equations and a system overview figure (Fig. 2), but does not contain any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | We will make our code publicly available. |
| Open Datasets | Yes | Thanks to the collections of HOI datasets (Li et al., 2023b; Bhatnagar et al., 2022; Jiang et al., 2023a) and the recent advancements in diffusion models (Saharia et al., 2022; Ramesh et al., 2022; Balaji et al., 2022; Stability.AI, 2022; 2023), existing HOI generative techniques (Zhang et al., 2022; 2023; 2024; Shafir et al., 2023; Kapon et al., 2024; Chen et al., 2024a) have exhibited promising capabilities by generating 4D human motions with object interactions from textual inputs. [...] The proliferation of large 3D datasets (Deitke et al., 2023; 2024; Wu et al., 2023b) has propelled 3D generation techniques forward. |
| Dataset Splits | No | The paper mentions using several datasets but does not specify any training, validation, or test splits (e.g., percentages or counts) for these datasets in the main text. |
| Hardware Specification | Yes | The training takes around 10 minutes for the 3D stage and 20 minutes for the 4D stage on a single NVIDIA A100 GPU. |
| Software Dependencies | Yes | We utilize the pre-trained Texture-Structure joint diffusion model from Human Gaussian (Liu et al., 2023d) and version 2.1 of Stable Diffusion (Stability.AI, 2022) to respectively calculate the SDS and spatial-aware SDS in our implementation. |
| Experiment Setup | Yes | Typically, for each 3D avatar-object pair, we train the 3D stage with a batch size of 16 for 400 epochs, and the 4D stage with a batch size of 10 for 400 epochs. [...] We use Adam (Kingma & Ba, 2015) optimizer for back-propagation. Additional implementation details can be found in the Appendix. |