Weakly-supervised HOI Detection via Prior-guided Bi-level Representation Learning
Authors: Bo Wan, Yongfei Liu, Desen Zhou, Tinne Tuytelaars, Xuming He
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on HICO-DET and V-COCO show that our method outperforms the previous works by a sizable margin, showing the efficacy of our HOI representation. |
| Researcher Affiliation | Academia | Bo Wan 1 , Yongfei Liu 2 , Desen Zhou 2, Tinne Tuytelaars 1, Xuming He 2,3 1 KU Leuven, Leuven, Belgium; 2 Shanghai Tech University, Shanghai, China 3 Shanghai Engineering Research Center of Intelligent Vision and Imaging |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/bobwan1995/Weakly-HOI. |
| Open Datasets | Yes | We benchmark our model on two public datasets: HICO-DET and V-COCO. HICO-DET (Chao et al., 2018) and V-COCO (Gupta & Malik, 2015). VCOCO: https://github.com/s-gupta/v-coco/ MIT License. |
| Dataset Splits | Yes | HICO-DET consists of 47776 images (38118 for training and 9658 for test). V-COCO is a subset of MSCOCO, consisting of 2533 images for training, 2867 for validation and 4946 for test. |
| Hardware Specification | Yes | We train up to 60K iterations with batch-size 24 in each on 4 NVIDIA 2080TI GPUs, and decay the learning rate by 10 times in 12K and 24K iteration. |
| Software Dependencies | No | The paper mentions using Adam W optimizer and backbone networks like ResNet-101 and ResNet-50, but does not provide specific version numbers for software libraries or frameworks. |
| Experiment Setup | Yes | For model learning, we set the detection score weight γ = 2.8 as default by following previous works (Zhang et al., 2021c; Li et al., 2019b), then optimize the entire network with Adam W and an initial learning rate of 1e-5 for backbone parameters and 1e-4 for others. We detach the parameters of the knowledge bank on the local branch for better model learning. We train up to 60K iterations with batch-size 24 in each on 4 NVIDIA 2080TI GPUs, and decay the learning rate by 10 times in 12K and 24K iteration. |