MR-NET: Exploiting Mutual Relation for Visual Relationship Detection

Authors: Yi Bin, Yang Yang, Chaofan Tao, Zi Huang, Jingjing Li, Heng Tao Shen8110-8117

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental results on two commonly used datasets (VG and VRD) demonstrate the superior performance of the proposed approach.
Researcher Affiliation Academia 1University of Electronic Science and Technology of China 2The University of Queensland
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes We conduct experiments on two common datasets, the Visual Relationship Detection (VRD) and Visual Genome (VG), to evaluate our mutual relation network... VRD (Lu et al. 2016): VRD is the first benchmark for visual relationship detection... VG (Krishna et al. 2017): VG is a large scale knowledge dataset...
Dataset Splits No The paper specifies training and test splits for the datasets (e.g., 'VRD... contains 4,000 and 1,000 images for training and test, respectively' and 'We randomly split the dataset with 62,253 images for training and 15,508 images for test' for VG), but it does not explicitly mention a separate validation split or its size/proportion.
Hardware Specification Yes We train 3 epochs for VG and 8 epochs for VRD respectively on a single GPU, Ge Force GTX TITAN X.
Software Dependencies No The paper mentions using the GloVe model for word embeddings and VGG16 with ImageNet pre-trained weights but does not specify versions for these or any other software libraries or frameworks used in implementation.
Experiment Setup Yes The constant margin term m in the mutual constraint is set as 0.5... We choose Adam algorithm to optimize our model, and set the learning rate as 0.00001 initially that is decreased with the scale of 10 at the beginning of 3rd and 8th epoch respectively... To augment training samples, we randomly change the bounding boxes (including shifting and scaling) with 5 to 10 percent of the width or height, and no more than 20 pixels.