Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
InstructHOI: Context-Aware Instruction for Multi-Modal Reasoning in Human-Object Interaction Detection
Authors: Jinguo Luo, Weihong Ren, Quanlong Zheng, Yanhao Zhang, Zhenlong Yuan, Zhiyong Wang, Haonan Lu, Honghai LIU
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on two public benchmarks demonstrate that our proposed method outperforms the state-of-the-art ones, under both supervised and zero-shot settings. |
| Researcher Affiliation | Collaboration | 1Harbin Institute of Technology, Shenzhen 2OPPO AI Center 3Institute of Computing Technology, Chinese Academy of Sciences |
| Pseudocode | No | The paper describes the methodology using text and figures, but no clearly labeled 'Pseudocode' or 'Algorithm' block is present. |
| Open Source Code | No | As we promised, the data and code will be released upon the publication of our paper. |
| Open Datasets | Yes | Specifically, due to the limited availability of HOI reasoning data [25], we aggregated five existing image-only HOI datasets [51, 52, 53, 54, 55] and build a high-quality dataset containing 140K image-text pairs across 1K object categories, 600 action categories, and 15K HOI categories. |
| Dataset Splits | Yes | The HICO-DET dataset comprises 47776 images, with 38118 for training and 9658 for testing, covering 117 actions, 80 objects, and 600 HOIs. Additionally, the 600 HOIs are divided into 138 Rare and 462 Non-Rare categories based on the sample distribution. The V-COCO dataset, contains 10346 images, including 5400 in the trainval set, and 4946 in the test set, across 29 actions, 80 objects, and 259 HOIs. |
| Hardware Specification | Yes | The entire Instruct HOI model is trained on four Tesla A800 GPUs with a batch size of 16 for 20 epochs, using the Adam W [60] optimizer. |
| Software Dependencies | No | The paper mentions software like DETR and Intern VL21b, and the Adam W optimizer, but does not specify version numbers for these or other key software components like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | The entire Instruct HOI model is trained on four Tesla A800 GPUs with a batch size of 16 for 20 epochs, using the Adam W [60] optimizer. |