reproducibilityindex.ai

Human-Object Interaction Detection Collaborated with Large Relation-driven Diffusion Models

Authors: Liulei Li, Wenguan Wang, Yi Yang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Benefited from above, DIFFUSIONHOI achieves SOTA performance on three datasets under both regular and zero-shot setups.
Researcher Affiliation	Academia	Liulei Li1, Wenguan Wang2 , Yi Yang2 1Re LER, AAII, University of Technology Sydney 2CCAI, Zhejiang University
Pseudocode	No	The paper describes the proposed methods in text and uses diagrams, but does not include explicit pseudocode or algorithm blocks.
Open Source Code	Yes	https://github.com/0liliulei/Diffusion HOI
Open Datasets	Yes	HICO-DET[20] is a large-scale HOI detection benchmark with 38,118/9,658 images for training/testing, respectively... V-COCO [21] is a curated subset of MS-COCO [96] including 2,533/2,867/4,946 images in train/val/ test sets... SWi G-HOI[22] is assembled from SWi G[97] and DOH[98] with about 45,000/14,000 for training/testing.
Dataset Splits	Yes	V-COCO [21] is a curated subset of MS-COCO [96] including 2,533/2,867/4,946 images in train/val/ test sets.
Hardware Specification	Yes	DIFFUSIONHOI is implemented in Py Torch and trained on 8 Tesla A40 GPUs with 48GB memory per card.
Software Dependencies	Yes	DIFFUSIONHOI is built upon Stable Diffusion v1.5 with x Formers[82] installed.
Experiment Setup	Yes	For HOI detection learning, we train the interaction decoder DIns and object decoder DHOI for 60 epochs with a base learning rate of 1e 4 and batch size of 16, using both synthesized data and the target dataset. Subsequently, the model is trained only on the target dataset for an additional 30 epochs with a base learning rate of 1e 5.