RR-Net: Injecting Interactive Semantics in Human-Object Interaction Detection

Authors: Dongming Yang, Yuexian Zou, Can Zhang, Meng Cao, Jie Chen

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that our proposed RR-Net sets a new state-of-the-art on both V-COCO and HICO-DET benchmarks and improves the baseline about 5.5% and 9.8% relatively, validating that this first effort in exploring relation reasoning and integrating interactive semantics has brought obvious improvement for end-to-end HOI detection.
Researcher Affiliation Academia Dongming Yang1 , Yuexian Zou1,2 , Can Zhang1 , Meng Cao1 and Jie Chen1,2 1School of ECE, Peking University, Shenzhen, China, 518055 2Peng Cheng Laboratory, Shenzhen, China, 518055
Pseudocode No The paper describes the proposed method in detail using prose and mathematical formulations, but it does not include any pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statement about releasing source code or include a link to a code repository.
Open Datasets Yes We evaluate our method on two large-scale benchmarks, including V-COCO [Yatskar et al., 2016] and HICO-DET [Chao et al., 2018] datasets.
Dataset Splits No The paper mentions using V-COCO and HICO-DET datasets and training details like batch size and epochs, but it does not specify the exact training/validation/test splits (e.g., percentages or sample counts) used for reproduction.
Hardware Specification Yes Our experiments are conducted by Pytorch on a single GPU of NVIDIA Tesla P100.
Software Dependencies No The paper states 'Our experiments are conducted by Pytorch'. However, it does not provide specific version numbers for Pytorch or any other software dependencies.
Experiment Setup Yes During training, input images have the resolution of 512 512, yielding a resolution of 128 128 for all output head features. We employ standard data augmentation following [Zhou et al., 2019]. Our model is optimized with Adam. The batch-size is set as 15 for VCOCO and 20 for HICO-DET. We train the model for 140 epochs, with the initial learning rate of 5e-4 which drops 10x at 90 and 120 epochs respectively. The top predictions T is set as 100.