Human-Object Interaction Detection Collaborated with Large Relation-driven Diffusion Models

Authors: Liulei Li, Wenguan Wang, Yi Yang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Benefited from above, DIFFUSIONHOI achieves SOTA performance on three datasets under both regular and zero-shot setups.
Researcher Affiliation Academia Liulei Li1, Wenguan Wang2 , Yi Yang2 1Re LER, AAII, University of Technology Sydney 2CCAI, Zhejiang University
Pseudocode No The paper describes the proposed methods in text and uses diagrams, but does not include explicit pseudocode or algorithm blocks.
Open Source Code Yes https://github.com/0liliulei/Diffusion HOI
Open Datasets Yes HICO-DET[20] is a large-scale HOI detection benchmark with 38,118/9,658 images for training/testing, respectively... V-COCO [21] is a curated subset of MS-COCO [96] including 2,533/2,867/4,946 images in train/val/ test sets... SWi G-HOI[22] is assembled from SWi G[97] and DOH[98] with about 45,000/14,000 for training/testing.
Dataset Splits Yes V-COCO [21] is a curated subset of MS-COCO [96] including 2,533/2,867/4,946 images in train/val/ test sets.
Hardware Specification Yes DIFFUSIONHOI is implemented in Py Torch and trained on 8 Tesla A40 GPUs with 48GB memory per card.
Software Dependencies Yes DIFFUSIONHOI is built upon Stable Diffusion v1.5 with x Formers[82] installed.
Experiment Setup Yes For HOI detection learning, we train the interaction decoder DIns and object decoder DHOI for 60 epochs with a base learning rate of 1e 4 and batch size of 16, using both synthesized data and the target dataset. Subsequently, the model is trained only on the target dataset for an additional 30 epochs with a base learning rate of 1e 5.