Improving Human-Object Interaction Detection via Phrase Learning and Label Composition

Authors: Zhimin Li, Cheng Zou, Yu Zhao, Boxun Li, Sheng Zhong1509-1517

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments are conducted to prove the effectiveness of the proposed Phrase HOI, which achieves significant improvement over the baseline and surpasses previous state-of-the-art methods on Full and Non Rare on the challenging HICO-DET benchmark.
Researcher Affiliation Collaboration 1 National Key Laboratory of Science and Technology on Multispectral Information Processing, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology 2 Megvii Technology
Pseudocode No The paper describes its methods in narrative text and with architectural diagrams, but it does not include any explicit pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository for its methodology.
Open Datasets Yes Experiments are conducted on V-COCO (Gupta and Malik 2015) and HICO-DET (Chao et al. 2018) benchmark.
Dataset Splits No The paper specifies training and test set sizes for HICO-DET ('38,118 images in training set and 9,658 in test set') but does not explicitly mention a separate validation set size or split.
Hardware Specification No The paper mentions using ResNet-50 and ResNet-101 backbones, but it does not specify the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No The paper mentions using pre-trained models like word2vec and GPT1 and initializing weights with DETR, but it does not provide specific version numbers for these or other software dependencies (e.g., Python, PyTorch, CUDA versions) needed for reproducibility.
Experiment Setup Yes The model is trained with AdamW, and the learning rate is set to 1e-4 except that the learning rate for backbone is set to 1e-5. The batch size for ResNet-50 and ResNet-101 are set to 64 and 32 respectively...All the models are trained for 200 epochs with once learning rate decay at epoch 150. The hyper-parameter α in Eq. 1 implies the loss weight of phrase and is set to 0.1, the hyper-parameter β in Eq. 2 implies the loss weight of triplet loss and is set to 10, the hyper-parameter m in Eq. 3 is set to 0.5 in experiments.