Semantic-Aware Human Object Interaction Image Generation
Authors: Zhu Xu, Qingchao Chen, Yuxin Peng, Yang Liu
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate that our method significantly improves generation quality under both HOI-specific and conventional image evaluation metrics. |
| Researcher Affiliation | Academia | 1Wangxuan Institute of Computer Technology, Peking University 2National Institute of Health Data Science, Peking University. |
| Pseudocode | Yes | The pseudo code of our pose and interaction boundary guided sampling is shown in Algorithm 1. The pipeline is shown in Algorithm 2. |
| Open Source Code | Yes | The code is available at https://github.com/XZPKU/SA-HOI.git |
| Open Datasets | Yes | Our dataset consists of 150 HOI categoires, covering human-object, human-animal, and human-human interaction scenarios for comprehensive evaluation. The categories are all collected from public HOI detection data-set HICO-DET (Chao et al., 2015). |
| Dataset Splits | No | The paper describes using a pre-trained model (Stable Diffusion v1.5) and a dataset for evaluation, but does not specify explicit train/validation splits for its own experimental setup or model training. |
| Hardware Specification | Yes | For the experiments, we use two A100 80G GPUs to sample images from the pre-trained models Stable Diffusion v1.5 (Rombach et al., 2022). |
| Software Dependencies | No | The paper mentions specific models and toolboxes used (e.g., Stable Diffusion v1.5, RTMPose toolbox, Mask-RCNN) but does not provide specific version numbers for underlying software dependencies like programming languages, frameworks (e.g., Python, PyTorch, TensorFlow), or other libraries. |
| Experiment Setup | Yes | For CFG, we adopt guidance scale of 7.5, the text prompt is A photo of a person verbing a/an object. for HOI category <verb, object>, and the negative prompt is set as . We adopt DDIMScheduler(von Platen et al., 2022) with 50 steps for the denoising process, and all the generated images are with size 512 512. Hyperparameters θ, δ, ϕ0, α, T are set as 0.01, 1, 1, 0.6 and 4. |