reproducibilityindex.ai

Ranking-Based Deep Cross-Modal Hashing

Authors: Xuanwu Liu, Guoxian Yu, Carlotta Domeniconi, Jun Wang, Yazhou Ren, Maozu Guo4400-4407

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on real multi-modal datasets show that RDCMH outperforms other competitive baselines and achieves the state-of-the-art performance in cross-modal retrieval applications.
Researcher Affiliation	Academia	1College of Computer and Information Sciences, Southwest University, Chongqing, China 2Hubei Key Laboratory of Intelligent Geo-Information Processing, China University of Geosciences, Wuhan, China 3Department of Computer Science, George Mason University, Fairfax, USA 4SMILE Lab, School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China 5School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China
Pseudocode	Yes	Algorithm 1 RDCMH: Ranking based Deep Cross-Modal Hashing
Open Source Code	Yes	The code of RDCMH is available at mlda.swu.edu.cn/codes.php?name=RDCMH.
Open Datasets	Yes	We use three benchmark datasets: Nus-wide, Pascal VOC, and Mirﬂicker to evaluate the performance of RDCMH. Nus-wide1 contains 260,648 web images, and some images are associated with textual tags. It is a multi-label dataset where each point is annotated with one or several labels from 81 concept labels. The text for each point is represented as a 1000-dimensional bag-of-words vector. The hand-crafted feature for each image is a 500-dimensional bag-of-visual words (BOVW) vector. Wiki2 is generated from a group of 2866 Wikipedia documents. Each document is an image-text pair labeled with 10 semantic classes. The images are represented by 128-dimensional SIFT feature vectors. The text articles are represented as probability distributions over 10 topics, which are derived from a Latent Dirichlet Allocation (LDA) model. Mirﬂickr3 originally contains 25,000 instances collected from Flicker. Each instance consists of an image and its associated textual tags, and is manually annotated with one or more labels, from a total of 24 semantic labels. The text for each point is represented as a 1386-dimensional bagof-words vector. For the hand-crafted feature based method, each image is represented by a 512-dimensional GIST feature vector. 1http://lms.comp.nus.edu.sg/research/NUS-WIDE.htm 2https://www.wikidata.org/wiki/Wikidata 3http://press.liacs.nl/mirﬂickr/mirdownload.html
Dataset Splits	No	The paper mentions 'training set' and 'mini-batch size for gradient descent to 128' and uses 'semi-supervised semantic ranking list used for training', but does not specify explicit training/validation/test splits or a validation set.
Hardware Specification	No	The paper does not specify any hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions 'CNN', 'Alex Net', and 'bag-of-words (BOW) representation' for feature learning, but does not provide specific version numbers for any software, libraries, or frameworks used.
Experiment Setup	Yes	As to RDCMH, we set the minibatch size for gradient descent to 128, and set dropout rate as 0.5 on the fully connected layers to avoid overﬁtting. The regularization parameter λ in Eq. (4) is set to 1, and the number of iterations for optimizing Eq. (4) is ﬁxed to 500. The adopted deep neural network for image modality is a CNN, which includes eight layers. The ﬁrst six layers are the same as those in CNN-F(Chatﬁeld et al. 2014). The seventh and eighth layer is a fully-connected layer with the outputs being the learned image features. As to the text modality, we ﬁrst represent each text as a vector with bag-of-words (BOW) representation. Next, the bag-of-words vectors are used as the inputs for a neural network with two fully-connected layers, denoted as full1 full2 . The full1 layer has 4096 neurons, and the second layer full2 has c (hashing codes) neurons, The activation function for the ﬁrst layer is Re LU, and that for the second layer is the identity function.