Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Human-Object Interaction Detection Collaborated with Large Relation-driven Diffusion Models
Authors: Liulei Li, Wenguan Wang, Yi Yang
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Benefited from above, DIFFUSIONHOI achieves SOTA performance on three datasets under both regular and zero-shot setups. |
| Researcher Affiliation | Academia | Liulei Li1, Wenguan Wang2 , Yi Yang2 1Re LER, AAII, University of Technology Sydney 2CCAI, Zhejiang University |
| Pseudocode | No | The paper describes the proposed methods in text and uses diagrams, but does not include explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | https://github.com/0liliulei/Diffusion HOI |
| Open Datasets | Yes | HICO-DET[20] is a large-scale HOI detection benchmark with 38,118/9,658 images for training/testing, respectively... V-COCO [21] is a curated subset of MS-COCO [96] including 2,533/2,867/4,946 images in train/val/ test sets... SWi G-HOI[22] is assembled from SWi G[97] and DOH[98] with about 45,000/14,000 for training/testing. |
| Dataset Splits | Yes | V-COCO [21] is a curated subset of MS-COCO [96] including 2,533/2,867/4,946 images in train/val/ test sets. |
| Hardware Specification | Yes | DIFFUSIONHOI is implemented in Py Torch and trained on 8 Tesla A40 GPUs with 48GB memory per card. |
| Software Dependencies | Yes | DIFFUSIONHOI is built upon Stable Diffusion v1.5 with x Formers[82] installed. |
| Experiment Setup | Yes | For HOI detection learning, we train the interaction decoder DIns and object decoder DHOI for 60 epochs with a base learning rate of 1e 4 and batch size of 16, using both synthesized data and the target dataset. Subsequently, the model is trained only on the target dataset for an additional 30 epochs with a base learning rate of 1e 5. |