Weakly-supervised HOI Detection via Prior-guided Bi-level Representation Learning

Authors: Bo Wan, Yongfei Liu, Desen Zhou, Tinne Tuytelaars, Xuming He

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on HICO-DET and V-COCO show that our method outperforms the previous works by a sizable margin, showing the efficacy of our HOI representation.
Researcher Affiliation Academia Bo Wan 1 , Yongfei Liu 2 , Desen Zhou 2, Tinne Tuytelaars 1, Xuming He 2,3 1 KU Leuven, Leuven, Belgium; 2 Shanghai Tech University, Shanghai, China 3 Shanghai Engineering Research Center of Intelligent Vision and Imaging
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/bobwan1995/Weakly-HOI.
Open Datasets Yes We benchmark our model on two public datasets: HICO-DET and V-COCO. HICO-DET (Chao et al., 2018) and V-COCO (Gupta & Malik, 2015). VCOCO: https://github.com/s-gupta/v-coco/ MIT License.
Dataset Splits Yes HICO-DET consists of 47776 images (38118 for training and 9658 for test). V-COCO is a subset of MSCOCO, consisting of 2533 images for training, 2867 for validation and 4946 for test.
Hardware Specification Yes We train up to 60K iterations with batch-size 24 in each on 4 NVIDIA 2080TI GPUs, and decay the learning rate by 10 times in 12K and 24K iteration.
Software Dependencies No The paper mentions using Adam W optimizer and backbone networks like ResNet-101 and ResNet-50, but does not provide specific version numbers for software libraries or frameworks.
Experiment Setup Yes For model learning, we set the detection score weight γ = 2.8 as default by following previous works (Zhang et al., 2021c; Li et al., 2019b), then optimize the entire network with Adam W and an initial learning rate of 1e-5 for backbone parameters and 1e-4 for others. We detach the parameters of the knowledge bank on the local branch for better model learning. We train up to 60K iterations with batch-size 24 in each on 4 NVIDIA 2080TI GPUs, and decay the learning rate by 10 times in 12K and 24K iteration.