Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation

Authors: Yukang Cao, Liang Pan, Kai Han, Kwan-Yee K Wong, Ziwei Liu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments with existing methods validate Avatar GO s superior generation and animation capabilities on a variety of human-object pairs and diverse poses.
Researcher Affiliation	Academia	1S-Lab, Nanyang Technological University, 2Shanghai AI Laboratory, 3The University of Hong Kong
Pseudocode	No	The paper describes its methodology in Section 3, including equations and a system overview figure (Fig. 2), but does not contain any explicit pseudocode or algorithm blocks.
Open Source Code	Yes	We will make our code publicly available.
Open Datasets	Yes	Thanks to the collections of HOI datasets (Li et al., 2023b; Bhatnagar et al., 2022; Jiang et al., 2023a) and the recent advancements in diffusion models (Saharia et al., 2022; Ramesh et al., 2022; Balaji et al., 2022; Stability.AI, 2022; 2023), existing HOI generative techniques (Zhang et al., 2022; 2023; 2024; Shafir et al., 2023; Kapon et al., 2024; Chen et al., 2024a) have exhibited promising capabilities by generating 4D human motions with object interactions from textual inputs. [...] The proliferation of large 3D datasets (Deitke et al., 2023; 2024; Wu et al., 2023b) has propelled 3D generation techniques forward.
Dataset Splits	No	The paper mentions using several datasets but does not specify any training, validation, or test splits (e.g., percentages or counts) for these datasets in the main text.
Hardware Specification	Yes	The training takes around 10 minutes for the 3D stage and 20 minutes for the 4D stage on a single NVIDIA A100 GPU.
Software Dependencies	Yes	We utilize the pre-trained Texture-Structure joint diffusion model from Human Gaussian (Liu et al., 2023d) and version 2.1 of Stable Diffusion (Stability.AI, 2022) to respectively calculate the SDS and spatial-aware SDS in our implementation.
Experiment Setup	Yes	Typically, for each 3D avatar-object pair, we train the 3D stage with a batch size of 16 for 400 epochs, and the 4D stage with a batch size of 10 for 400 epochs. [...] We use Adam (Kingma & Ba, 2015) optimizer for back-propagation. Additional implementation details can be found in the Appendix.