Transitive Hashing Network for Heterogeneous Multimedia Retrieval

Authors: Zhangjie Cao, Mingsheng Long, Jianmin Wang, Qiang Yang

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive empirical evidence validates that the proposed THN approach yields state of the art retrieval performance on standard multimedia benchmarks, i.e. NUS-WIDE and Image Net-Yahoo QA.
Researcher Affiliation Academia KLiss, MOE; TNList; School of Software, Tsinghua University, China Hong Kong University of Science and Technology, Hong Kong caozhangjie14@gmail.com {mingsheng,jimwang}@tsinghua.edu.cn qyang@cse.ust.hk
Pseudocode No The paper does not include a pseudocode block or a clearly labeled algorithm section.
Open Source Code No The codes and configurations will be made available online.
Open Datasets Yes NUS-WIDE1 is a popular dataset for cross-modal retrieval, which contains 269,648 image-text pairs. The annotation for 81 semantic categories is provided for evaluation, which we prune by keeping the image-text pairs that belong to the 16 categories shared with Image Net (Deng et al. 2009). Each image is resized into 256 256 pixels, and each text is represented by a bag-of-word (Bo W) feature vector. ... 1http://lms.comp.nus.edu.sg/research/NUS-WIDE.htm and Image Net-Yahoo QA (Wei et al. 2014) is a heterogenous media dataset of images from Image Net (Deng et al. 2009) and QAs from Yahoo Answers (Yahoo QA).
Dataset Splits Yes NUS-WIDE: We randomly select 2,000 images or texts as query set, and correspondingly, the remaining texts and images are used as the database. We randomly select 30 images and 30 texts per class distinctly from the database as the training set... Image Net-Yahoo QA: We randomly select 2,000 images from Image Net or 2000 texts from Yahoo QA as query set, and correspondingly, the remaining texts in Yahoo QA and the images in Image Net are used as database. For the training set, we randomly select 2000 NUS-WIDE images and 2000 NUS-WIDE texts as supervised auxiliary dataset and select 500 Image Net images and 500 Yahoo text documents as unsupervised training data.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments.
Software Dependencies No The paper states 'We implement the THN model in Caffe' but does not provide specific version numbers for Caffe or other software dependencies.
Experiment Setup Yes For image network, we extend Alex Net (Krizhevsky, Sutskever, and Hinton 2012), fine-tune convolutional layer conv1 conv5 and fully-connected layer fc6 fc7 copied from the pre-trained model and train the fch hash layer from scratch, all via back-propagation. Since the hash layer fch is trained from scratch, we set its learning rate to be 10 times that of the other layers. For text network, we employ a three-layer MLP with the numbers of hidden units set to 1000, 500, and b, respectively. We use the mini-batch stochastic gradient descent (SGD) with 0.9 momentum and the learning rate strategy in Caffe, cross-validate learning rate from 10 5 to 10 1 with a multiplicative step-size 101/2.