Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

HOComp: Interaction-Aware Human-Object Composition

Authors: Dong Liang, Jinyuan Jia, Yuhao LIU, Rynson Lau

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on our dataset show that HOComp effectively generates harmonious human-object interactions with consistent appearances, and outperforms relevant methods qualitatively and quantitatively.
Researcher Affiliation Academia Dong Liang Tongji University / City UHK / HKUST(GZ) EMAIL Jinyuan Jia Tongji University / HKUST(GZ) EMAIL Yuhao Liu City UHK EMAIL Rynson W.H. Lau City UHK EMAIL
Pseudocode No The paper describes the methodology in text and illustrates a pipeline in Figure 2, but it does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No Project page: https: //dliang293.github.io/HOComp-project/. In the NeurIPS Paper Checklist, the authors state: "We have opensourced our github repository will soon release all the codes and dataset."
Open Datasets Yes To train the model, we introduce a new dataset called Interaction-aware Human-Object Composition (IHOC) dataset, which includes images of humans before and after interacting with the foreground object, the interaction region, and the corresponding interaction type. We introduce the Interaction-aware Human-Object Composition (IHOC) dataset, and conduct extensive experiments on this dataset to demonstrate the superiority of our method. NeurIPS Paper Checklist, Question 5: "We have opensourced our github repository will soon release all the codes and dataset."
Dataset Splits No The paper describes the creation of the IHOC dataset (11,700 composited images) for training and the HOIBench benchmark (600 human-object interaction instances) for evaluation, including details about how the benchmark instances were sampled. However, it does not provide specific training/validation/test splits for the IHOC dataset used to train their model.
Hardware Specification Yes Training takes approximately 20 hours on 2 A100 GPUs.
Software Dependencies Yes We adopt FLUX.1 [dev] [3] as the base model and fine-tune it using Lo RA [23] with rank 16, applied to the attention layers. We employ DWPose [95] for pose estimation, Zero123+ [64] for multi-view generation and GPT-4o[52] as MLLM in MRPG.
Experiment Setup Yes We adopt FLUX.1 [dev] [3] as the base model and fine-tune it using Lo RA [23] with rank 16, applied to the attention layers. All training images are resized to 512 512 resolution. The model is trained for 10,000 steps with a batch size of 2, using Adam W and a learning rate of 1e-5.