RR-Net: Injecting Interactive Semantics in Human-Object Interaction Detection
Authors: Dongming Yang, Yuexian Zou, Can Zhang, Meng Cao, Jie Chen
IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that our proposed RR-Net sets a new state-of-the-art on both V-COCO and HICO-DET benchmarks and improves the baseline about 5.5% and 9.8% relatively, validating that this first effort in exploring relation reasoning and integrating interactive semantics has brought obvious improvement for end-to-end HOI detection. |
| Researcher Affiliation | Academia | Dongming Yang1 , Yuexian Zou1,2 , Can Zhang1 , Meng Cao1 and Jie Chen1,2 1School of ECE, Peking University, Shenzhen, China, 518055 2Peng Cheng Laboratory, Shenzhen, China, 518055 |
| Pseudocode | No | The paper describes the proposed method in detail using prose and mathematical formulations, but it does not include any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statement about releasing source code or include a link to a code repository. |
| Open Datasets | Yes | We evaluate our method on two large-scale benchmarks, including V-COCO [Yatskar et al., 2016] and HICO-DET [Chao et al., 2018] datasets. |
| Dataset Splits | No | The paper mentions using V-COCO and HICO-DET datasets and training details like batch size and epochs, but it does not specify the exact training/validation/test splits (e.g., percentages or sample counts) used for reproduction. |
| Hardware Specification | Yes | Our experiments are conducted by Pytorch on a single GPU of NVIDIA Tesla P100. |
| Software Dependencies | No | The paper states 'Our experiments are conducted by Pytorch'. However, it does not provide specific version numbers for Pytorch or any other software dependencies. |
| Experiment Setup | Yes | During training, input images have the resolution of 512 512, yielding a resolution of 128 128 for all output head features. We employ standard data augmentation following [Zhou et al., 2019]. Our model is optimized with Adam. The batch-size is set as 15 for VCOCO and 20 for HICO-DET. We train the model for 140 epochs, with the initial learning rate of 5e-4 which drops 10x at 90 and 120 epochs respectively. The top predictions T is set as 100. |