Human-Object Interaction Detection Collaborated with Large Relation-driven Diffusion Models
Authors: Liulei Li, Wenguan Wang, Yi Yang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Benefited from above, DIFFUSIONHOI achieves SOTA performance on three datasets under both regular and zero-shot setups. |
| Researcher Affiliation | Academia | Liulei Li1, Wenguan Wang2 , Yi Yang2 1Re LER, AAII, University of Technology Sydney 2CCAI, Zhejiang University |
| Pseudocode | No | The paper describes the proposed methods in text and uses diagrams, but does not include explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | https://github.com/0liliulei/Diffusion HOI |
| Open Datasets | Yes | HICO-DET[20] is a large-scale HOI detection benchmark with 38,118/9,658 images for training/testing, respectively... V-COCO [21] is a curated subset of MS-COCO [96] including 2,533/2,867/4,946 images in train/val/ test sets... SWi G-HOI[22] is assembled from SWi G[97] and DOH[98] with about 45,000/14,000 for training/testing. |
| Dataset Splits | Yes | V-COCO [21] is a curated subset of MS-COCO [96] including 2,533/2,867/4,946 images in train/val/ test sets. |
| Hardware Specification | Yes | DIFFUSIONHOI is implemented in Py Torch and trained on 8 Tesla A40 GPUs with 48GB memory per card. |
| Software Dependencies | Yes | DIFFUSIONHOI is built upon Stable Diffusion v1.5 with x Formers[82] installed. |
| Experiment Setup | Yes | For HOI detection learning, we train the interaction decoder DIns and object decoder DHOI for 60 epochs with a base learning rate of 1e 4 and batch size of 16, using both synthesized data and the target dataset. Subsequently, the model is trained only on the target dataset for an additional 30 epochs with a base learning rate of 1e 5. |