Graph Convolutional Network Hashing for Cross-Modal Retrieval

Authors: Ruiqing Xu, Chao Li, Junchi Yan, Cheng Deng, Xianglong Liu

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on three benchmark datasets demonstrate that the proposed GCH outperforms the state-of-the-art methods.
Researcher Affiliation Academia 1School of Electronic Engineering, Xidian University 2Dept. of CSE & Mo E Key Lab of Artificial Intelligence, Shanghai Jiao Tong University 3Beihang University
Pseudocode Yes Algorithm 1 Semantic encoder guided learning for graph convolutional network hashing (GCH).
Open Source Code No The paper does not contain any statement about making its source code publicly available, nor does it provide a link to a code repository.
Open Datasets Yes Three popular benchmark datasets in cross-modal retrieval: MIRFLICKR-25K [Huiskes and Lew, 2008], NUS-WIDE [Chua et al., 2009], and Microsoft COCO [Lin et al., 2014] are adopted to validate our proposed method.
Dataset Splits No For MIRFLICKR-25K: 'using 10,000 data points for training and 2,000 for query. The remaining part is used for retrieval.' For NUS-WIDE: '10,500 data points for training and 2,100 data points for query. The rest serves as retrieval set.' For MS-COCO: '10,000 data points for training and 5,000 for query. The rest of data points serve as retrieval set.' While MS-COCO dataset itself has a 'validation' split mentioned, the paper's specific experimental split uses 'training' and 'query' (test) sets, but doesn't explicitly state a validation split for their own process.
Hardware Specification Yes Our GCH is implemented with Tensor Flow [Abadi et al., 2016] and executed on a server with one NVIDIA TITAN Xp GPU.
Software Dependencies No Our GCH is implemented with Tensor Flow [Abadi et al., 2016]. No specific version number is provided for TensorFlow or any other software dependency.
Experiment Setup Yes Initialization: network parameters θx,y,l,G; hyperparameters: α, β, γ; learning rate: µ; mini-batch size: N x,y,l b = 128; maximum iteration number: Tmax, iter=0; We adopt the first seven layers of CNNF [Chatfield et al., 2014] neural network as image feature encoder... For texts, a neural network with four fully-connected layers is constructed... Semantic encoder is built with a three-layer feed-forward network... A two-layer GCN with each layer s output feature dimensions being Nb 1024 and Nb K is employed... Sigmoid activation is used to output predicted labels; tanh activation is used to output hash codes; and the rest of the layers are all uniformly activated by the LRe LU function.