Neural-Logic Human-Object Interaction Detection
Authors: Liulei Li, Jianan Wei, Wenguan Wang, Yi Yang
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate LOGICHOI on V-COCO and HICO-DET under both normal and zero-shot setups, achieving significant improvements over existing methods. Specifically, it achieves a mean m AP score of 65.0% across two scenarios. For in-depth analysis, we perform a series of ablative studies on HICO-DET[54] test. |
| Researcher Affiliation | Academia | Liulei Li1, Jianan Wei2, Wenguan Wang2 , Yi Yang2 1Re LER, AAII, University of Technology Sydney 2CCAI, Zhejiang University |
| Pseudocode | No | The paper describes mathematical formulations and architectural components but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | https://github.com/weijianan1/Logic HOI |
| Open Datasets | Yes | We conduct experiments on two widely-used HOI detection benchmarks: V-COCO [53] is a carefully curated subset of MS-COCO [110] which contains 10,346 images (5,400 for training and 4,946 for testing). HICO-DET[54] consists of 47,776 images in total, with 38,118 for training and 9,658 designated for testing. |
| Dataset Splits | No | The paper specifies training and testing splits for V-COCO and HICO-DET, but does not explicitly mention a separate validation dataset split or its size/percentage for reproduction. |
| Hardware Specification | Yes | on 4 Ge Force RTX 3090 GPUs. |
| Software Dependencies | No | The paper mentions software components like 'DETR', 'CLIP', and 'Adam optimizer' but does not provide specific version numbers for these or any other libraries or frameworks. |
| Experiment Setup | Yes | we conducted training for 90 epochs using the Adam optimizer with a batch size of 16 and base learning rate 1e 4, on 4 Ge Force RTX 3090 GPUs. The learning rate is scheduled following a step policy, decayed by a factor of 0.1 at the 60th epoch. The number of human, object, action queries Nh, No, Na is set to 32 for efficiency, and the hidden sizes of all the modules are set to D=768. Here α is set to 0.2 empirically. Following the convention[11,16,17,20,66], we set K to 100. |