Deep Structured Learning for Visual Relationship Detection

Authors: Yaohui Zhu, Shuqiang Jiang

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on the Visual Relationship Detection (VRD) dataset and the large-scale Visual Genome (VG) dataset validate the effectiveness of our method, which outperforms state-of-the-art methods.
Researcher Affiliation Academia 1Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China 2University of Chinese Academy of Sciences, Beijing, China
Pseudocode No The paper includes a network architecture diagram (Figure 2) and describes the method in text and mathematical formulas, but it does not provide any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any statements about releasing open-source code for the described methodology, nor does it provide any links to a code repository.
Open Datasets Yes we evaluate our proposed method on Visual relationship detection (VRD) (Lu et al. 2016) and Visual Genome (VG) (Zhang et al. 2017) datasets.
Dataset Splits No The paper specifies 'train/test split' for both VRD and VG datasets, providing detailed numbers for training and test images/relationships. However, it does not explicitly mention or specify details for a 'validation' split.
Hardware Specification No The paper mentions using a 'VGG-16 network' for Faster R-CNN, but it does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No The paper mentions using 'Faster R-CNN' and 'VGG-16 network' as components but does not provide specific version numbers for any software dependencies (e.g., deep learning frameworks, libraries, or programming languages with versions).
Experiment Setup Yes In the prediction of relationship, we empirically set (si, s) = 3, (pi, p) = 3, (oi, o) = 3, a momentum of 0.9, α = r = 0.005, β = 0.2, a weight decay of 0.05 for the VRD dataset, and α = r = 0.1, β = 0.3, a weight decay of 0.001 for the VG dataset.