Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation

Authors: Yukang Cao, Liang Pan, Kai Han, Kwan-Yee K Wong, Ziwei Liu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments with existing methods validate Avatar GO s superior generation and animation capabilities on a variety of human-object pairs and diverse poses.
Researcher Affiliation Academia 1S-Lab, Nanyang Technological University, 2Shanghai AI Laboratory, 3The University of Hong Kong
Pseudocode No The paper describes its methodology in Section 3, including equations and a system overview figure (Fig. 2), but does not contain any explicit pseudocode or algorithm blocks.
Open Source Code Yes We will make our code publicly available.
Open Datasets Yes Thanks to the collections of HOI datasets (Li et al., 2023b; Bhatnagar et al., 2022; Jiang et al., 2023a) and the recent advancements in diffusion models (Saharia et al., 2022; Ramesh et al., 2022; Balaji et al., 2022; Stability.AI, 2022; 2023), existing HOI generative techniques (Zhang et al., 2022; 2023; 2024; Shafir et al., 2023; Kapon et al., 2024; Chen et al., 2024a) have exhibited promising capabilities by generating 4D human motions with object interactions from textual inputs. [...] The proliferation of large 3D datasets (Deitke et al., 2023; 2024; Wu et al., 2023b) has propelled 3D generation techniques forward.
Dataset Splits No The paper mentions using several datasets but does not specify any training, validation, or test splits (e.g., percentages or counts) for these datasets in the main text.
Hardware Specification Yes The training takes around 10 minutes for the 3D stage and 20 minutes for the 4D stage on a single NVIDIA A100 GPU.
Software Dependencies Yes We utilize the pre-trained Texture-Structure joint diffusion model from Human Gaussian (Liu et al., 2023d) and version 2.1 of Stable Diffusion (Stability.AI, 2022) to respectively calculate the SDS and spatial-aware SDS in our implementation.
Experiment Setup Yes Typically, for each 3D avatar-object pair, we train the 3D stage with a batch size of 16 for 400 epochs, and the 4D stage with a batch size of 10 for 400 epochs. [...] We use Adam (Kingma & Ba, 2015) optimizer for back-propagation. Additional implementation details can be found in the Appendix.