Visual Relationship Detection With Deep Structural Ranking

Authors: Kongming Liang, Yuhong Guo, Hong Chang, Xilin Chen

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that our proposed method outperforms the state-of-the-art on the two widely used datasets. We also demonstrate its superiority in detecting zero-shot relationships.
Researcher Affiliation Academia Kongming Liang,1,3 Yuhong Guo,2 Hong Chang,1 Xilin Chen1,3 1Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing 100190, China 2School of Computer Science, Carleton University, Ottawa, Canada 3University of Chinese Academy of Sciences, Beijing 100049, China
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any statements about releasing source code or links to a code repository.
Open Datasets Yes VRD (Lu et al. 2016). VRD (Visual Relationship Dataset) contains 5000 images with 100 object categories and 70 predicates. ... VG (Krishna et al. 2016). The annotations of original VG (Visual Genome) dataset are very noisy. Therefore, we use the cleaned up version (Zhang et al. 2017) by using official pruning of objects and relations.
Dataset Splits No The paper specifies training and test splits for the datasets ('4,000 training images and 1,000 test images' for VRD, and '73,801 images for training and 25,857 images for testing' for VG), but it does not explicitly mention or detail a separate validation dataset split.
Hardware Specification Yes Our implementations are based on the Pytorch deep learning framework on a single Ge Force GTX TITAN X.
Software Dependencies No The paper mentions 'Pytorch deep learning framework' but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes We use Adam optimizer to train the whole network and the learning rate is set to be 0.00001. During training, the first five convolutional layers of the base network are fixed without tuning. For the newly added layers, the learning rate is multiplied by 10 to accelerate the learning process. We train the proposed model for 5 epochs and divide the learning rate by a factor of 10 after the third epoch.