Zero-shot Visual Relation Detection via Composite Visual Cues from Large Language Models
Authors: Lin Li, Jun Xiao, Guikun Chen, Jian Shao, Yueting Zhuang, Long Chen
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on four VRD benchmarks have demonstrated the effectiveness and interpretability of RECODE. |
| Researcher Affiliation | Academia | 1Zhejiang University 2The Hong Kong University of Science and Technology {mukti, junx, guikun.chen, jshao, yzhuang}@zju.edu.cn, longchen@ust.hk |
| Pseudocode | No | The paper describes the RECODE framework and its components using text and a diagram (Figure 3), but it does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | https://github.com/HKUST-Long Group/RECODE |
| Open Datasets | Yes | To evaluate our RECODE, we conducted experiments on four benchmark datasets: Visual Genome (VG) [13] and GQA [14] datasets for scene graph generation (SGG), and HICO-DET [15] and V-COCO [16] datasets for human-object interaction (HOI) detection. |
| Dataset Splits | No | The paper specifies the number of "testing images" for each dataset (e.g., "VG [13] contains 26,443 images for testing", "GQA [14]... contains 8,208 images for testing"), and mentions using "the same split provided by [21]" for GQA, but it does not explicitly detail training or validation splits used for their own experiments. |
| Hardware Specification | No | The paper mentions using "the Open AI’s publicly accessible resources" and "Vision Transformer with a base configuration (Vi T-B/32)" for CLIP, and "GPT-3.5-turbo" for LLM. However, it does not provide specific details about the hardware (e.g., GPU models, CPU types, or memory) used to run their experiments. |
| Software Dependencies | No | The paper specifies using "GPT-3.5-turbo" for the LLM and "Vision Transformer with a base configuration (Vi T-B/32)" for CLIP. While these indicate specific models/configurations, they do not provide typical software dependencies with explicit version numbers (e.g., Python, PyTorch, TensorFlow versions or other libraries). |
| Experiment Setup | No | The paper states, "The bounding box and category of objects were given in all experiments." However, it does not provide specific details regarding hyperparameters (e.g., learning rate, batch size, number of epochs) or other system-level training settings for their proposed method. |