Zero-shot Visual Relation Detection via Composite Visual Cues from Large Language Models

Authors: Lin Li, Jun Xiao, Guikun Chen, Jian Shao, Yueting Zhuang, Long Chen

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on four VRD benchmarks have demonstrated the effectiveness and interpretability of RECODE.
Researcher Affiliation Academia 1Zhejiang University 2The Hong Kong University of Science and Technology {mukti, junx, guikun.chen, jshao, yzhuang}@zju.edu.cn, longchen@ust.hk
Pseudocode No The paper describes the RECODE framework and its components using text and a diagram (Figure 3), but it does not include any explicit pseudocode or algorithm blocks.
Open Source Code Yes https://github.com/HKUST-Long Group/RECODE
Open Datasets Yes To evaluate our RECODE, we conducted experiments on four benchmark datasets: Visual Genome (VG) [13] and GQA [14] datasets for scene graph generation (SGG), and HICO-DET [15] and V-COCO [16] datasets for human-object interaction (HOI) detection.
Dataset Splits No The paper specifies the number of "testing images" for each dataset (e.g., "VG [13] contains 26,443 images for testing", "GQA [14]... contains 8,208 images for testing"), and mentions using "the same split provided by [21]" for GQA, but it does not explicitly detail training or validation splits used for their own experiments.
Hardware Specification No The paper mentions using "the Open AI’s publicly accessible resources" and "Vision Transformer with a base configuration (Vi T-B/32)" for CLIP, and "GPT-3.5-turbo" for LLM. However, it does not provide specific details about the hardware (e.g., GPU models, CPU types, or memory) used to run their experiments.
Software Dependencies No The paper specifies using "GPT-3.5-turbo" for the LLM and "Vision Transformer with a base configuration (Vi T-B/32)" for CLIP. While these indicate specific models/configurations, they do not provide typical software dependencies with explicit version numbers (e.g., Python, PyTorch, TensorFlow versions or other libraries).
Experiment Setup No The paper states, "The bounding box and category of objects were given in all experiments." However, it does not provide specific details regarding hyperparameters (e.g., learning rate, batch size, number of epochs) or other system-level training settings for their proposed method.